Skip to content

Latest commit

 

History

History
1857 lines (1594 loc) · 56.7 KB

0.8.0.md

File metadata and controls

1857 lines (1594 loc) · 56.7 KB

FlexSearch v0.8 (Preview)

npm install flexsearch@latest

What's New

  • Persistent indexes support for: IndexedDB (Browser), Redis, SQLite, Postgres, MongoDB, Clickhouse
  • Enhanced language customization via the new Encoder class
  • Result Highlighting
  • Query performance achieve results up to 4.5 times faster compared to the previous generation v0.7.x by also improving the quality of results
  • Enhanced support for larger indexes or larger result sets
  • Improved offset and limit processing achieve up to 100 times faster traversal performance through large datasets
  • Support for larger In-Memory index with extended key size (the defaults maximum keystore limit is: 2^24)
  • Greatly enhanced performance of the whole text encoding pipeline
  • Improved indexing of numeric content (Triplets)
  • Intermediate result sets and Resolver
  • Basic Resolver: and, or, xor, not, limit, offset, boost, resolve
  • Improved charset collection
  • New charset preset soundex which further reduces memory consumption by also increasing "fuzziness"
  • Performance gain when polling tasks to the index by using "Event-Loop-Caches"
  • Up to 100 times faster deletion/replacement when not using the additional "fastupdate" register
  • Regex Pre-Compilation (transforms hundreds of regex rules into just a few)
  • Extended support for multiple tags (DocumentIndex)
  • Custom Fields ("Virtual Fields")
  • Custom Filter
  • Custom Score Function
  • Added French language preset (stop-word filter, stemmer)
  • Enhanced Worker Support
  • Export / Import index in chunks
  • Improved Build System + Bundler (Supported: CommonJS, ESM, Global Namespace), also the import of language packs are now supported for Node.js
  • Full covering index.d.ts type definitions
  • Fast-Boot Serialization optimized for Server-Side-Rendering (PHP, Python, Ruby, Rust, Java, Go, Node.js, ...)

Compare Benchmark: 0.7.0 vs. 0.8.0

Persistent Indexes

FlexSearch provides a new Storage Adapter where indexes are delegated through persistent storages.

Supported:

The .export() and .import() methods are still available for non-persistent In-Memory indexes.

All search capabilities are available on persistent indexes like:

  • Context-Search
  • Suggestions
  • Cursor-based Queries (Limit/Offset)
  • Scoring (supports a resolution of up to 32767 slots)
  • Document-Search
  • Partial Search
  • Multi-Tag-Search
  • Boost Fields
  • Custom Encoder
  • Resolver
  • Tokenizer (Strict, Forward, Reverse, Full)
  • Document Store (incl. enrich results)
  • Worker Threads to run in parallel
  • Auto-Balanced Cache (top queries + last queries)

All persistent variants are optimized for larger sized indexes under heavy workload. Almost every task will be streamlined to run in batch/parallel, getting the most out of the selected database engine. Whereas the InMemory index can't share their data between different nodes when running in a cluster, every persistent storage can handle this by default.

Examples Node.js

Examples Browser

import FlexSearchIndex from "./index.d.ts";
import Database from "./db/indexeddb/index.js";
// create an index
const index = new FlexSearchIndex();
// create db instance with optional prefix
const db = new Database("my-store");
// mount and await before transfering data
await flexsearch.mount(db);

// update the index as usual
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);

// changes are automatically committed by default
// when you need to wait for the task completion, then you
// can use the commit method explicitely:
await index.commit();

Alternatively mount a store by index creation:

const index = new FlexSearchIndex({
    db: new Storage("my-store")
});

// await for the db response before access the first time
await index.db;
// apply changes to the index
// ...

Query against a persistent storage just as usual:

const result = await index.search("gulliver");

Auto-Commit is enabled by default and will process changes asynchronously in batch. You can fully disable the auto-commit feature and perform them manually:

const index = new FlexSearchIndex({
    db: new Storage("my-store"),
    commit: false
});
// update the index
index.add(1, "content...");
index.update(2, "content...");
index.remove(3);

// transfer all changes to the db
await index.commit();

You can call the commit method manually also when commit: true option was set.

Benchmark

The benchmark was measured in "terms per second".

Store Add Search 1 Search N Replace Remove Not Found Scaling
terms per sec terms per sec terms per sec terms per sec terms per sec terms per sec
IndexedDB 123,298 83,823 62,370 57,410 171,053 425,744 No
Redis 1,566,091 201,534 859,463 117,013 129,595 875,526 Yes
Sqlite 269,812 29,627 129,735 174,445 1,406,553 122,566 No
Postgres 354,894 24,329 76,189 324,546 3,702,647 50,305 Yes
MongoDB 515,938 19,684 81,558 243,353 485,192 67,751 Yes
Clickhouse 1,436,992 11,507 22,196 931,026 3,276,847 16,644 Yes

Search 1: Single term query
Search N: Multi term query (Context-Search)

The benchmark was executed against a single client.

Encoder

Search capabilities highly depends on language processing. The old workflow wasn't really practicable. The new Encoder class is a huge improvement and fully replaces the encoding part. Some FlexSearch options was moved to the new Encoder instance.

New Encoding Pipeline:

  1. charset normalization
  2. custom preparation
  3. split into terms (apply includes/excludes)
  4. filter (pre-filter)
  5. matcher (substitute terms)
  6. stemmer (substitute term endings)
  7. filter (post-filter)
  8. replace chars (mapper)
  9. custom regex (replacer)
  10. letter deduplication
  11. apply finalize

Example

const encoder = new Encoder({
    normalize: true,
    dedupe: true,
    cache: true,
    include: {
        letter: true,
        number: true,
        symbol: false,
        punctuation: false,
        control: false,
        char: "@"
    }
});

You can use an include instead of an exclude definition:

const encoder = new Encoder({
    exclude: {
        letter: false,
        number: false,
        symbol: true,
        punctuation: true,
        control: true
    }
});

Instead of using include or exclude you can pass a regular expression to the field split:

const encoder = new Encoder({
    split: /\s+/
});

The definitions include and exclude is a replacement for split. You can just define one of those 3.

Adding custom functions to the encoder pipeline:

const encoder = new Encoder({
    normalize: function(str){
        return str.toLowerCase();
    },
    prepare: function(str){
        return str.replace(/&/g, " and ");
    },
    finalize: function(arr){
        return arr.filter(term => term.length > 2);
    }
});

Assign encoder to an index:

const index = new Index({ 
    encoder: encoder
});

Define language specific transformations:

const encoder = new Encoder({
    replacer: [
        /[´`ʼ]/g, "'"
    ],
    filter: new Set([
        "and",
    ]),
    matcher: new Map([
        ["xvi", "16"]
    ]),
    stemmer: new Map([
        ["ly", ""]
    ]),
    mapper: new Map([
        ["é", "e"]
    ])
});

Or use predefined language and extend it with custom options:

import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder(EnglishBookPreset, {
    filter: false
});

Equivalent:

import EnglishBookPreset from "./lang/en.js";
const encoder = new Encoder(EnglishBookPreset);
encoder.assign({ filter: false });

Assign extensions to the encoder instance:

import LatinEncoderPreset from "./charset/latin/simple.js";
import EnglishBookPreset from "./lang/en.js";
// stack definitions to the encoder instance
const encoder = new Encoder()
    .assign(LatinEncoderPreset)
    .assign(EnglishBookPreset)
// override preset options ...
    .assign({ minlength: 3 });
// assign further presets ...

When adding extension to the encoder every previously assigned configuration is still intact, very much like Mixins, also when assigning custom functions.

Add custom transformations to an existing index:

import LatinEncoderPreset from "./charset/latin/default.js";
const encoder = new Encoder(LatinEncoderPreset);
encoder.addReplacer(/[´`ʼ]/g, "'");
encoder.addFilter("and");
encoder.addMatcher("xvi", "16");
encoder.addStemmer("ly", "");
encoder.addMapper("é", "e");

Shortcut for just assigning one encoder configuration to an index:

import LatinEncoderPreset from "./charset/latin/default.js";
const index = new Index({ 
    encoder: LatinEncoderPreset
});

Resolver

Retrieve an unresolved result:

const raw = index.search("a short query", { 
    resolve: false
});

You can apply and chain different resolver methods to the raw result, e.g.:

raw.and( ... )
   .and( ... )
   .boost(2)
   .or( ... ,  ... )
   .limit(100)
   .xor( ... )
   .not( ... )
   // final resolve
   .resolve({
       limit: 10,
       offset: 0,
       enrich: true
   });

The default resolver:

const raw = index.search("a short query", { 
    resolve: false
});
const result = raw.resolve();

Or use declaration style:

import Resolver from "./resolver.js";
const raw = new Resolver({ 
    index: index,
    query: "a short query"
});
const result = raw.resolve();

Chainable Boolean Operations

The basic concept explained:

// 1. get one or multiple unresolved results
const raw1 = index.search("a short query", { 
    resolve: false
});
const raw2 = index.search("another query", {
    resolve: false,
    boost: 2
});

// 2. apply and chain resolver operations
const raw3 = raw1.and(raw2, /* ... */);
// you can access the aggregated result by raw3.result
console.log("The aggregated result is:", raw3.result)
// apply further operations ...

// 3. resolve final result
const result = raw3.resolve({
    limit: 100,
    offset: 0
});
console.log("The final result is:", result)

Use inline queries:

const result = index.search("further query", {
    // set resolve to false on the first query
    resolve: false,
    boost: 2
})
.or( // union
    index.search("a query")
    .and( // intersection
        index.search("another query", {
            boost: 2
        })
    )
)
.not( // exclusion
    index.search("some query")
)
// resolve the result
.resolve({
    limit: 100,
    offset: 0
});
import Resolver from "./resolver.js";
const result = new Resolver({
    index: index,
    query: "further query",
    boost: 2
})
.or({
    and: [{ // inner expression
        index: index,
        query: "a query"
    },{
        index: index,
        query: "another query",
        boost: 2
    }]
})
.not({ // exclusion
    index: index,
    query: "some query"
})
.resolve({
    limit: 100,
    offset: 0
});

When all queries are made against the same index, you can skip the index in every declaration followed after initially calling new Resolve():

import Resolver from "./resolver.js";
const result = new Resolver({
    index: index,
    query: "a query"
})
.and({ query: "another query", boost: 2 })
.or ({ query: "further query", boost: 2 })
.not({ query: "some query" })
.resolve(100);

Custom Resolver

function CustomResolver(raw){
    // console.log(raw)
    let output;
    // generate output ...
    return output;
}

const result = index.search("a short query", { 
    resolve: CustomResolver
});

Result Highlighting

Result highlighting could be just enabled when using Document-Index with enabled Data-Store. Also when you just want to add id-content-pairs you'll need to use a DocumentIndex for this feature (just define a simple document descriptor as shown below).

// create the document index
const index = new Document({
  document: {
    store: true,
    index: [{
      field: "title",
      tokenize: "forward",
      encoder: Charset.LatinBalance
    }]
  }
});

// add data
index.add({
  "id": 1,
  "title": "Carmencita"
});
index.add({
  "id": 2,
  "title": "Le clown et ses chiens"
});

// perform a query
const result = index.search({
  query: "karmen or clown or not found",
  suggest: true,
  // set enrich to true (required)
  enrich: true,
  // highlight template
  // $1 is a placeholder for the matched partial
  highlight: "<b>$1</b>"
});

The result will look like:

[{
  "field": "title",
  "result": [{
      "id": 1,
      "doc": {
        "id": 1,
        "title": "Carmencita"
      },
      "highlight": "<b>Carmen</b>cita"
    },{
      "id": 2,
      "doc": {
        "id": 2,
        "title": "Le clown et ses chiens"
      },
      "highlight": "Le <b>clown</b> et ses chiens"
    }
  ]
}]

Big In-Memory Keystores

The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions.

const index = new FlexSearchIndex({
    // e.g. set keystore range to 8-Bit:
    // 2^8 * 2^24 = 2^32 keys total
    keystore: 8 
});

You can theoretically store up to 2^88 keys (64-Bit address range).

The internal ID arrays scales automatically when limit of 2^31 has reached by using Proxy.

Persistent storages has no keystore limit by default. You should not enable keystore when using persistent indexes, as long as you do not stress the buffer too hard before calling index.commit().

Multi-Tag-Search

Assume this document schema (a dataset from IMDB):

{
    "tconst": "tt0000001",
    "titleType": "short",
    "primaryTitle": "Carmencita",
    "originalTitle": "Carmencita",
    "isAdult": 0,
    "startYear": "1894",
    "endYear": "",
    "runtimeMinutes": "1",
    "genres": [
        "Documentary",
        "Short"
    ]
}

An appropriate document descriptor could look like:

import LatinEncoder from "./charset/latin/simple.js";

const flexsearch = new Document({
    encoder: LatinEncoder,
    resolution: 3,
    document: {
        id: "tconst",
        //store: true, // document store
        index: [{
            field: "primaryTitle",
            tokenize: "forward"
        },{
            field: "originalTitle",
            tokenize: "forward"
        }],
        tag: [
            "startYear",
            "genres"
        ]
    }
});

The field contents of primaryTitle and originalTitle are encoded by the forward tokenizer. The field contents of startYear and genres are added as tags.

Get all entries of a specific tag:

const result = flexsearch.search({
    //enrich: true, // enrich documents
    tag: { "genres": "Documentary" },
    limit: 1000,
    offset: 0
});

Get entries of multiple tags (intersection):

const result = flexsearch.search({
    //enrich: true, // enrich documents
    tag: { 
        "genres": ["Documentary", "Short"],
        "startYear": "1894"
    }
});

Combine tags with queries (intersection):

const result = flexsearch.search({
    query: "Carmen", // forward tokenizer
    tag: { 
        "genres": ["Documentary", "Short"],
        "startYear": "1894"
    }
});

Alternative declaration:

const result = flexsearch.search("Carmen", {
    tag: [{
        field: "genres",
        tag: ["Documentary", "Short"]
    },{
        field: "startYear",
        tag: "1894"
    }]
});

Filter Fields (Index / Tags / Datastore)

const flexsearch = new Document({
    document: {
        id: "id",
        index: [{
            // custom field:
            field: "somefield",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }],
        tag: [{
            field: "city",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }],
        store: [{
            field: "anotherfield",
            filter: function(data){
                // return false to filter out
                // return anything else to keep
                return true;
            }
        }]
    }
});

Custom Fields (Index / Tags / Datastore)

Dataset example:

{
    "id": 10001,
    "firstname": "John",
    "lastname": "Doe",
    "city": "Berlin",
    "street": "Alexanderplatz",
    "number": "1a",
    "postal": "10178"
}

You can apply custom fields derived from data or by anything else:

const flexsearch = new Document({
    document: {
        id: "id",
        index: [{
            // custom field:
            field: "fullname",
            custom: function(data){
                // return custom string
                return data.firstname + " " + 
                       data.lastname;
            }
        },{
            // custom field:
            field: "location",
            custom: function(data){
                return data.street + " " +
                       data.number + ", " +
                       data.postal + " " +
                       data.city;
            }
        }],
        tag: [{
            // existing field
            field: "city"
        },{
            // custom field:
            field: "category",
            custom: function(data){
                let tags = [];
                // push one or multiple tags
                // ....
                return tags;
            }
        }],
        store: [{
            field: "anotherfield",
            custom: function(data){
                // return a falsy value to filter out
                // return anything else as to keep in store
                return data;
            }
        }]
    }
});

Filter is also available in custom functions when returning false.

Perform a query against the custom field as usual:

const result = flexsearch.search({
    query: "10178 Berlin Alexanderplatz",
    field: "location"
});
const result = flexsearch.search({
    query: "john doe",
    tag: { "city": "Berlin" }
});

Custom Score Function

const index = new FlexSearchIndex({
    resolution: 10,
    score: function(content, term, term_index, partial, partial_index){
        // you'll need to return a number between 0 and "resolution"
        // score is starting from 0, which is the highest score
        // for a resolution of 10 you can return 0 - 9
        // ... 
        return 3;
    } 
});

A common situation is you have some predefined labels which are related to some kind of order, e.g. the importance or priority. A priority label could be high, moderate, low so you can derive the scoring from those properties. Another example is when you have something already ordered and you would like to keep this order as relevance.

Probably you won't need the parameters passed to the score function. But when needed here are the parameters from the score function explained:

  1. content is the whole content as an array of terms (encoded)
  2. term is the current term which is actually processed (encoded)
  3. term_index is the index of the term in the content array
  4. partial is the current partial of a term which is actually processed
  5. partial_index is the index position of the partial within the term

Partials params are empty when using tokenizer strict. Let's take an example by using the tokenizer full.

The content: "This is an example of partial encoding"
The highlighting part marks the partial which is actually processed. Then your score function will called by passing these parameters:

function score(content, term, term_index, partial, partial_index){
    content = ["this", "is", "an", "example", "of", "partial", "encoding"]
    term = "example"
    term_index = 3
    partial = "amp"
    partial_index = 2
} 

Merge Document Results

By default, the result set of Field-Search has a structure grouped by field names:

[{
    field: "fieldname-1",
    result: [{
        id: 1001,
        doc: {/* stored document */}
    }]
},{
    field: "fieldname-2",
    result: [{
        id: 1001,
        doc: {/* stored document */}
    }]
},{
    field: "fieldname-3",
    result: [{
        id: 1002,
        doc: {/* stored document */}
    }]
}]

By passing the search option merge: true the result set will be merged into (group by id):

[{
    id: 1001,
    doc: {/* stored document */}
    field: ["fieldname-1", "fieldname-2"]
},{
    id: 1002,
    doc: {/* stored document */}
    field: ["fieldname-3"]
}]

Extern Worker Configuration

When using Worker by also assign custom functions to the options e.g.:

  • Custom Encoder
  • Custom Encoder methods (normalize, prepare, finalize)
  • Custom Score (function)
  • Custom Filter (function)
  • Custom Fields (function)

... then you'll need to move your field configuration into a file which exports the configuration as a default export. The field configuration is not the whole Document-Descriptor.

When not using custom functions in combination with Worker you can skip this part.

Since every field resolves into a dedicated Worker, also every field which includes custom functions should have their own configuration file accordingly.

Let's take this document descriptor:

{
    document: {
        index: [{
            // this is the field configuration
            // ---->
            field: "custom_field",
            custom: function(data){
                return "custom field content";
            }
            // <------
        }]
    }
};

The configuration which needs to be available as a default export is:

{
    field: "custom_field",
    custom: function(data){
        return "custom field content";
    }
};

You're welcome to make some suggestions how to improve the handling of extern configuration.

Example Node.js:

An extern configuration for one WorkerIndex, let's assume it is located in ./custom_field.js:

const { Charset } = require("flexsearch");
const { LatinSimple } = Charset;
// it requires a default export:
module.exports = {
    encoder: LatinSimple,
    tokenize: "forward",
    // custom function:
    custom: function(data){
        return "custom field content";
    }
};

Create Worker Index along the configuration above:

const { Document } = require("flexsearch");
const flexsearch = new Document({
    worker: true,
    document: {
        index: [{
            // the field name needs to be set here
            field: "custom_field",
            // path to your config from above:
            config: "./custom_field.js",
        }]
    }
});

Browser (ESM)

An extern configuration for one WorkerIndex, let's assume it is located in ./custom_field.js:

import { Charset } from "./dist/flexsearch.bundle.module.min.js";
const { LatinSimple } = Charset;
// it requires a default export:
export default {
    encoder: LatinSimple,
    tokenize: "forward",
    // custom function:
    custom: function(data){
        return "custom field content";
    }
};

Create Worker Index with the configuration above:

import { Document } from "./dist/flexsearch.bundle.module.min.js";
// you will need to await for the response!
const flexsearch = await new Document({
    worker: true,
    document: {
        index: [{
            // the field name needs to be set here
            field: "custom_field",
            // Absolute URL to your config from above:
            config: "http://localhost/custom_field.js"
        }]
    }
});

Here it needs the absolute URL, because the WorkerIndex context is from type Blob and you can't use relative URLs starting from this context.

Test Case

As a test the whole IMDB data collection was indexed, containing of:

JSON Documents: 9,273,132
Fields: 83,458,188
Tokens: 128,898,832

The used index configuration has 2 fields (using bidirectional context of depth: 1), 1 custom field, 2 tags and a full datastore of all input json documents.

A non-Worker Document index requires 181 seconds to index all contents.
The Worker index just takes 32 seconds to index them all, by processing every field and tag in parallel. For such large content it is a quite impressive result.

CSP-friendly Worker (Browser)

When just using worker by passing the option worker: true, the worker will be created by code generation under the hood. This might have issues when using strict CSP settings.

You can overcome this issue by passing the filepath to the worker file like worker: "./worker.js". The original worker file is located at src/worker/worker.js.

Fuzzy-Search

Fuzzysearch describes a basic concept of how making queries more tolerant. FlexSearch provides several methods to achieve fuzziness:

  1. Use a tokenizer: forward, reverse or full
  2. Don't forget to use any of the builtin encoder simple > balance > advanced > extra > soundex (sorted by fuzziness)
  3. Use one of the language specific presets e.g. /lang/en.js for en-US specific content
  4. Enable suggestions by passing the search option suggest: true

Additionally, you can apply custom Mapper, Replacer, Stemmer, Filter or by assigning a custom normalize(str), prepare(str) or finalize(arr) function to the Encoder.

Compare Fuzzy-Search Encoding

Original term which was indexed: "Struldbrugs"

Encoder: LatinExact LatinDefault LatinSimple LatinBalance LatinAdvanced LatinExtra LatinSoundex
Index Size 3.1 Mb 1.9 Mb 1.8 Mb 1.7 Mb 1.6 Mb 1.1 Mb 0.7 Mb
Struldbrugs
struldbrugs
strũldbrųĝgs
strultbrooks
shtruhldbrohkz
zdroltbrykz
struhlbrogger

The index size was measured after indexing the book "Gulliver's Travels".

Custom Encoder

Since it is very simple to create a custom Encoder, you are welcome to create your own. e.g.

function customEncoder(content){
   const tokens = [];
   // split content into terms/tokens
   // apply your changes to each term/token
   // you will need to return an Array of terms/tokens
   // so just iterate through the input string and
   // push tokens to the array
   // ...
   return tokens;
}

const index = new Index({
   // set to strict when your tokenization was already done
   tokenize: "strict",
   encode: customEncoder
});

If you get some good results please feel free to share your encoder.

Fast-Boot Serialization for Server-Side-Rendering (PHP, Python, Ruby, Rust, Java, Go, Node.js, ...)

This is an experimental feature with limited support which probably might drop in future release. You're welcome to give some feedback.

When using Server-Side-Rendering you can create a different export which instantly boot up. Especially when using Server-side rendered content, this could help to restore a static index on page load. Document-Indexes aren't supported yet for this method.

When your index is too large you should use the default export/import mechanism.

As the first step populate the FlexSearch index with your contents.

You have two options:

1. Create a function as string

const fn_string = index.serialize();

The contents of fn_string is a valid Javascript-Function declared as inject(index). Store it or place this somewhere in your code.

This function basically looks like:

function inject(index){
    index.reg = new Set([/* ... */]);
    index.map = new Map([/* ... */]);
    index.ctx = new Map([/* ... */]);
}

You can save this function by e.g. fs.writeFileSync("inject.js", fn_string); or place it as string in your SSR-generated markup.

After creating the index on client side just call the inject method like:

const index = new Index({/* use same configuration! */});
inject(index);

That's it.

You'll need to use the same configuration as you used before the export. Any changes on the configuration needs to be re-indexed.

2. Create just a function body as string

Alternatively you can use lazy function declaration by passing false to the serialize function:

const fn_body = index.serialize(false);

You will get just the function body which looks like:

index.reg = new Set([/* ... */]);
index.map = new Map([/* ... */]);
index.ctx = new Map([/* ... */]);

Now you can place this in your code directly (name your index as index), or you can also create an inject function from it, e.g.:

const inject = new Function("index", fn_body);

This function is callable like the above example:

const index = new Index();
inject(index);

Load Library (Node.js, ESM, Legacy Browser)

npm install flexsearch

The dist folder are located in: node_modules/flexsearch/dist/

Build File CDN
flexsearch.bundle.debug.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.bundle.debug.js
flexsearch.bundle.min.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.bundle.min.js
flexsearch.bundle.module.debug.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.bundle.module.debug.js
flexsearch.bundle.module.min.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.bundle.module.min.js
flexsearch.es5.debug.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.es5.debug.js
flexsearch.es5.min.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.es5.min.js
flexsearch.light.debug.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.light.debug.js
flexsearch.light.min.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.light.min.js
flexsearch.light.module.debug.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.light.module.debug.js
flexsearch.light.module.min.js Download https://cdn.jsdelivr.net/gh/nextapps-de/flexsearch@0.8.1/dist/flexsearch.light.module.min.js
Javascript Modules (ESM) Download https://github.com/nextapps-de/flexsearch/tree/0.8.1/dist/module
Javascript Modules Minified (ESM) Download https://github.com/nextapps-de/flexsearch/tree/0.8.1/dist/module-min
Javascript Modules Debug (ESM) Download https://github.com/nextapps-de/flexsearch/tree/0.8.1/dist/module-debug
flexsearch.custom.js Read more about "Custom Build"

All debug versions are providing debug information through the console and gives you helpful advices on certain situations. Do not use them in production, since they are special builds containing extra debugging processes which noticeably reduce performance.

The abbreviations used at the end of the filenames indicates:

  • bundle All features included, FlexSearch is available on window.FlexSearch
  • light Only basic features are included, FlexSearch is available on window.FlexSearch
  • es5 bundle has support for EcmaScript5, FlexSearch is available on window.FlexSearch
  • module indicates that this bundle is a Javascript module (ESM), FlexSearch members are available by import { Index, Document, Worker, Encoder, Charset } from "./flexsearch.bundle.module.min.js" or alternatively using the default export import FlexSearch from "./flexsearch.bundle.module.min.js"
  • min bundle is minified
  • debug bundle has enabled debug mode and contains additional code just for debugging purposes (do not use for production)

Non-Module Bundles (ES5 Legacy)

Non-Module Bundles export all their features to the public namespace "FlexSearch" e.g. window.FlexSearch.Index or window.FlexSearch.Document.

Load the bundle by a script tag:

<script src="dist/flexsearch.bundle.min.js"></script>
<script>
  // ... access FlexSearch
  var Index = window.FlexSearch.Index;
  var index = new Index(/* ... */);
</script>

FlexSearch Members are accessible on:

var Index = window.FlexSearch.Index;
var Document = window.FlexSearch.Document;
var Encoder = window.FlexSearch.Encoder;
var Charset = window.FlexSearch.Charset;
var Resolver = window.FlexSearch.Resolver;
var Worker = window.FlexSearch.Worker;
var IdxDB = window.FlexSearch.IndexedDB;
// only exported by non-module builds:
var Language = window.FlexSearch.Language;

Load language packs:

<!-- English: -->
<script src="dist/lang/en.min.js"></script>
<!-- German: -->
<script src="dist/lang/de.min.js"></script>
<!-- French: -->
<script src="dist/lang/fr.min.js"></script>
<script>
  var EnglishEncoderPreset = window.FlexSearch.Language.en;
  var GermanEncoderPreset = window.FlexSearch.Language.de;
  var FrenchEncoderPreset = window.FlexSearch.Language.fr;
</script>

Module (ESM)

When using modules you can choose from 2 variants: flexsearch.xxx.module.min.js has all features bundled ready for production, whereas the folder /dist/module/ export all the features in the same structure as the source code but here compiler flags was resolved.

Also, for each variant there exist:

  1. A debug version for the development
  2. A pre-compiled minified version for production

Use the bundled version exported as a module (default export):

<script type="module">
    import FlexSearch from "./dist/flexsearch.bundle.module.min.js";
    const index = new FlexSearch.Index(/* ... */);
</script>

Or import FlexSearch members separately by:

<script type="module">
    import { Index, Document, Encoder, Charset, Resolver, Worker, IdxDB } 
        from "./dist/flexsearch.bundle.module.min.js";
    const index = new Index(/* ... */);
</script>

Use non-bundled modules:

<script type="module">
    import Index from "./dist/module/index.js";
    import Document from "./dist/module/document.js";
    import Encoder from "./dist/module/encoder.js";
    import Charset from "./dist/module/charset.js";
    import Resolver from "./dist/module/resolver.js";
    import Worker from "./dist/module/worker.js";
    import IdxDB from "./dist/module/db/indexeddb/index.js";
    const index = new Index(/* ... */);
</script>

Language packs are accessible via:

import EnglishEncoderPreset from "./dist/module/lang/en.js";
import GermanEncoderPreset from "./dist/module/lang/de.js";
import FrenchEncoderPreset from "./dist/module/lang/fr.js";

Also, pre-compiled non-bundled production-ready modules are located in dist/module-min/, whereas the debug version is located at dist/module-debug/.

You can also load modules via CDN:

<script type="module">
    import Index from "https://unpkg.com/flexsearch@0.8.1/dist/module/index.js";
    const index = new Index(/* ... */);
</script>

Node.js

Install FlexSearch via NPM:

npm install flexsearch

Use the default export:

const FlexSearch = require("flexsearch");
const index = new FlexSearch.Index(/* ... */);

Or require FlexSearch members separately by:

const { Index, Document, Encoder, Charset, Resolver, Worker, IdxDB } = require("flexsearch");
const index = new Index(/* ... */);

When you are using ESM in Node.js then just use the Modules explained one section above.

Language packs are accessible via:

const EnglishEncoderPreset = require("flexsearch/lang/en");
const GermanEncoderPreset = require("flexsearch/lang/de");
const FrenchEncoderPreset = require("flexsearch/lang/fr");

Persistent Connectors are accessible via:

const Postgres = require("flexsearch/db/postgres");
const Sqlite = require("flexsearch/db/sqlite");
const MongoDB = require("flexsearch/db/mongodb");
const Redis = require("flexsearch/db/redis");
const Clickhouse = require("flexsearch/db/clickhouse");

Custom Builds

The /src/ folder of this repository requires some compilation to resolve the build flags. Those are your options:

  • Closure Compiler (Advanced Compilation) (used by this library here)
  • Babel + Plugin babel-plugin-conditional-compile (used by this library here)

You can't resolve build flags with:

  • Webpack
  • esbuild
  • rollup
  • Terser

These are some of the basic builds located in the /dist/ folder:

npm run build:bundle
npm run build:light
npm run build:module
npm run build:es5

Perform a custom build (UMD bundle) by passing build flags:

npm run build:custom SUPPORT_DOCUMENT=true SUPPORT_TAGS=true LANGUAGE_OUT=ECMASCRIPT5 POLYFILL=true

Perform a custom build in ESM module format:

npm run build:custom RELEASE=custom.module SUPPORT_DOCUMENT=true SUPPORT_TAGS=true 

Perform a debug build:

npm run build:custom DEBUG=true SUPPORT_DOCUMENT=true SUPPORT_TAGS=true 

On custom builds each build flag will be set to false by default when not passed.

The custom build will be saved to dist/flexsearch.custom.xxxx.min.js or when format is module to dist/flexsearch.custom.module.xxxx.min.js (the "xxxx" is a hash based on the used build flags).

Supported Build Flags

Flag Values Info

Feature Flags
SUPPORT_WORKER true, false
SUPPORT_ENCODER true, false
SUPPORT_CHARSET true, false
SUPPORT_CACHE true, false
SUPPORT_ASYNC true, false Asynchronous Rendering (support Promises)
SUPPORT_STORE true, false
SUPPORT_SUGGESTION true, false
SUPPORT_SERIALIZE true, false
SUPPORT_DOCUMENT true, false
SUPPORT_TAGS true, false
SUPPORT_PERSISTENT true, false
SUPPORT_KEYSTORE true, false
SUPPORT_COMPRESSION true, false
SUPPORT_RESOLVER true, false

Compiler Flags
DEBUG true, false Output debug information to the console (default: false)
RELEASE




custom
custom.module
bundle
bundle.module
es5
light
compact
POLYFILL true, false Include Polyfills (based on LANGUAGE_OUT)
PROFILER true, false Just used for automatic performance tests
LANGUAGE_OUT










ECMASCRIPT3
ECMASCRIPT5
ECMASCRIPT_2015
ECMASCRIPT_2016
ECMASCRIPT_2017
ECMASCRIPT_2018
ECMASCRIPT_2019
ECMASCRIPT_2020
ECMASCRIPT_2021
ECMASCRIPT_2022
ECMASCRIPT_NEXT
STABLE
Target language

Misc

A formula to determine a well-balanced value for the resolution is: $2*floor(\sqrt{content.length})$ where content is the value pushed by index.add(). Here the maximum length of all contents should be used.

Import / Export (In-Memory)

Persistent-Indexes and Worker-Indexes don't support Import/Export.

Export an Index or Document-Index to the folder /export/:

import { promises as fs } from "fs";

await index.export(async function(key, data){
  await fs.writeFile("./export/" + key, data, "utf8");
});

Import from folder /export/ into an Index or Document-Index:

const index = new Index({/* keep old config and place it here */});

const files = await fs.readdir("./export/");
for(let i = 0; i < files.length; i++){
  const data = await fs.readFile("./export/" + files[i], "utf8");
  await index.import(files[i], data);
}

You'll need to use the same configuration as you used before the export. Any changes on the configuration needs to be re-indexed.

Migration

  • The index option property "minlength" has moved to the Encoder Class
  • The index option flag "optimize" was removed
  • The index option flag "lang" was replaced by the Encoder Class .assign()
  • Boost cannot apply upfront anymore when indexing, instead you can use the boost property on a query dynamically
  • All definitions of the old text encoding process was replaced by similar definitions (Array changed to Set, Object changed to Map). You can use of the helper methods like .addMatcher(char_match, char_replace) which adds everything properly.
  • The default value for fastupdate is set to false by default when not passed via options
  • The method index.encode() has moved to index.encoder.encode()
  • The options charset and lang was removed from index (replaced by Encoder.assign({...}))
  • Every charset collection (files in folder /lang/**.js) is now exported as a config object (instead of a function). This config needs to be created by passing to the constructor new Encoder(config) or can be added to an existing instance via encoder.assign(config). The reason was to keep the default encoder configuration when having multiple document indexes.
  • The property bool from DocumentOptions was removed (replaced by Resolver)
  • The static methods FlexSearch.registerCharset() and FlexSearch.registerLanguage() was removed, those collections are now exported to FlexSearch.Charset which can be accessed as module import { Charset } from "flexsearch" and language packs are now applied by encoder.assign()
  • Instead of e.g. "latin:simple" the Charset collection is exported as a module and has to be imported by e.g. import LatinSimple from "./charset.js" and then assigned to an existing Encoder by encoder.assign(LatinSimple) or by creation encoder = new Encoder(LatinSimple)
  • You can import language packs by dist/module/lang/* when using ESM and by const EnglishPreset = require("flexsearch/lang/en"); when using CommonJS (Node.js)
  • The method index.append() is now deprecated and will be removed in the near future, because it isn't consistent and leads into unexpected behavior when not used properly. You should only use index.add() to push contents to the index.
  • Using the async variants like .searchAsync is now deprecated (but still works), asynchronous responses will always return from Worker-Index and from Persistent-Index, everything else will return a non-promised result. Having both types of methods looks like the developers can choose between them, but they can't.
  • Any of your exports from versions below v0.8 are not compatible to import into v0.8

What's next?

Unfortunately, not everything could be finished and needs to be done in the upcoming version.

  • The Resolver currently does not support Document-Indexes, there is still some work to do.
  • Config serialization for persistent indexes (store configuration, check migrations, import and restore field configurations)
  • Tooling for persistent indexes (list all tables, remove tables)