Skip to content

Commit c4a389b

Browse files
committed
update docs
1 parent 8f2aeea commit c4a389b

File tree

8 files changed

+224
-6
lines changed

8 files changed

+224
-6
lines changed

README.md

+3
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ Before our installation, make sure you have installed the Tensorflow
6868
pip install delta-nlp
6969
```
7070

71+
Follow the usage steps here if you install by pip:
72+
[A Text Classification Usage Example for pip users](docs/tutorials/training/text_class_pip_example.md)
73+
7174
### Installation from Source Code
7275

7376
To install from the source code, We use [conda](https://conda.io/) to

docs/index.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Welcome to DELTA's documentation!
1818
:caption: Installation
1919
:name: sec-install
2020

21+
installation/pick_install
2122
installation/using_docker
2223
installation/manual_setup
2324
installation/deltann_compile
@@ -31,7 +32,8 @@ Welcome to DELTA's documentation!
3132

3233
tutorials/training/egs
3334
tutorials/training/speech_features
34-
tutorials/training/text_class_example
35+
tutorials/training/text_class_pip_example
36+
tutorials/training/text_class_source_example
3537
tutorials/training/data/asr_example
3638
tutorials/training/data/emotion-speech-cls
3739
tutorials/training/data/kws-cls
+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Install from the source code
2+
3+
To install from the source code, We use [conda](https://conda.io/) to
4+
install required packages. Please
5+
[install conda](https://conda.io/en/latest/miniconda.html) if you do not
6+
have it in your system.
7+
8+
Also, we provide two options to install DELTA, `nlp` version or `full`
9+
version. `nlp` version needs minimal requirements and only installs NLP
10+
related packages:
11+
12+
```shell
13+
# Run the installation script for NLP version, with CPU or GPU.
14+
cd tools
15+
./install/install-delta.sh nlp [cpu|gpu]
16+
```
17+
18+
**Note**: Users from mainland China may need to set up conda mirror sources, see [./tools/install/install-delta.sh](tools/install/install-delta.sh) for details.
19+
20+
If you want to use both NLP and speech packages, you can install the `full` version. The full version needs [Kaldi](https://github.com/kaldi-asr/kaldi) library, which can be pre-installed or installed using our installation script.
21+
22+
```shell
23+
cd tools
24+
# If you have installed Kaldi
25+
KALDI=/your/path/to/Kaldi ./install/install-delta.sh full [cpu|gpu]
26+
# If you have not installed Kaldi, use the following command
27+
# ./install/install-delta.sh full [cpu|gpu]
28+
```
29+
30+
To verify the installation, run:
31+
32+
```shell
33+
# Activate conda environment
34+
conda activate delta-py3.6-tf2.0.0
35+
# Or use the following command if your conda version is < 4.6
36+
# source activate delta-py3.6-tf2.0.0
37+
38+
# Add DELTA enviornment
39+
source env.sh
40+
41+
# Generate mock data for text classification.
42+
pushd egs/mock_text_cls_data/text_cls/v1
43+
./run.sh
44+
popd
45+
46+
# Train the model
47+
python3 delta/main.py --cmd train_and_eval --config egs/mock_text_cls_data/text_cls/v1/config/han-cls.yml
48+
```
+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Pick a installation way for yourself
2+
3+
## Multiple installation ways
4+
5+
Currently we support multiple ways to install `DELTA`. Please choose one
6+
installation for yourself according to your usage and needs.
7+
8+
## Install by pip
9+
10+
For the **quick demo of the features** and **pure NLP users**, you can
11+
install the `nlp` version of `DELTA` by pip with a simple command:
12+
13+
```bash
14+
pip install delta-nlp
15+
```
16+
17+
Check here for
18+
[the tutorial for usage of `delta-nlp`](tutorials/training/text_class_pip_example).
19+
20+
**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
21+
MacOS or Linux.
22+
23+
## Install from the source code
24+
25+
For users who need **whole function of delta** (including speech and
26+
nlp), you can clone our repository and install from the source code.
27+
28+
Please follow the steps here: [Install from the source code](installation/install_from_source)
29+
30+
## Use docker
31+
32+
For users who are **capable of use docker**, you can pull our images
33+
directly. This maybe the best choice for docker users.
34+
35+
Please follow the steps here:
36+
[Installation using Docker](installation/using_docker)
37+

docs/installation/using_docker.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Intallation using Docker
1+
# Installation using Docker
22

33
You can directly pull the pre-build docker images for DELTA and DELTANN. We have created the following docker images:
44

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# A Text Classification Usage Example for pip users
2+
3+
## Intro
4+
5+
In this tutorial, we demonstrate a text classification task with a
6+
demo mock dataset **for users install by pip**.
7+
8+
A complete process contains following steps:
9+
10+
- Prepare the data set.
11+
- Develop custom modules (optional).
12+
- Set the config file.
13+
- Train a model.
14+
- Export a model
15+
16+
Please clone our demo repository:
17+
18+
```bash
19+
git clone --depth 1 https://github.com/applenob/delta_demo.git
20+
cd ./delta_demo
21+
```
22+
23+
## A quick review for installation
24+
25+
If you haven't install `delta-nlp`, please:
26+
27+
```bash
28+
pip install delta-nlp
29+
```
30+
31+
**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
32+
MacOS or Linux.
33+
34+
## Prepare the Data Set
35+
36+
run the script:
37+
38+
```
39+
./gen_data.sh
40+
```
41+
42+
The generated data are in directory: `data`.
43+
44+
The generated data for text classification should be in the standard format for text classification, which is "label\tdocument".
45+
46+
## Develop custom modules (optional)
47+
48+
Please make sure we don't have modules you need before you decide to
49+
develop your own modules.
50+
51+
```python
52+
@registers.model.register
53+
class TestHierarchicalAttentionModel(HierarchicalModel):
54+
"""Hierarchical text classification model with attention."""
55+
56+
def __init__(self, config, **kwargs):
57+
super().__init__(config, **kwargs)
58+
59+
logging.info("Initialize HierarchicalAttentionModel...")
60+
61+
self.vocab_size = config['data']['vocab_size']
62+
self.num_classes = config['data']['task']['classes']['num_classes']
63+
self.use_true_length = config['model'].get('use_true_length', False)
64+
if self.use_true_length:
65+
self.split_token = config['data']['split_token']
66+
self.padding_token = utils.PAD_IDX
67+
```
68+
69+
You need to register this module file path in the config file
70+
`config/han-cls.yml` (relative to the current work directory).
71+
72+
```yml
73+
custom_modules:
74+
- "test_model.py"
75+
```
76+
77+
## Set the Config File
78+
79+
The config file of this example is `config/han-cls.yml`
80+
81+
In the config file, we set the task to be `TextClsTask` and the model to be `TestHierarchicalAttentionModel`.
82+
83+
### Config Details
84+
85+
The config is composed by 3 parts: `data`, `model`, `solver`.
86+
87+
Data related configs are under `data`.
88+
You can set the data path (including training set, dev set and test set).
89+
The data process configs can also be found here (mainly under `task`).
90+
For example, we set `use_dense: false` since no dense input was used here.
91+
We set `language: chinese` since it's a Chinese text.
92+
93+
Model parameters are under `model`. The most important config here is
94+
`name: TestHierarchicalAttentionModel`, which specifies the model to
95+
use. Detail structure configs are under `net->structure`. Here, the
96+
`max_sen_len` is 32 and `max_doc_len` is 32.
97+
98+
The configs under `solver` are used by solver class, including training optimizer, evaluation metrics and checkpoint saver.
99+
Here the class is `RawSolver`.
100+
101+
## Train a Model
102+
103+
After setting the config file, you are ready to train a model.
104+
105+
```
106+
delta --cmd train_and_eval --config config/han-cls.yml
107+
```
108+
109+
The argument `cmd` tells the platform to train a model and also evaluate
110+
the dev set during the training process.
111+
112+
After enough steps of training, you would find the model checkpoints have been saved to the directory set by `saver->model_path`, which is `exp/han-cls/ckpt` in this case.
113+
114+
## Export a Model
115+
116+
If you would like to export a specific checkpoint to be exported, please set `infer_model_path` in config file. Otherwise, platform will simply find the newest checkpoint under the directory set by `saver->model_path`.
117+
118+
```
119+
delta --cmd export_model --config/han-cls.yml
120+
```
121+
122+
The exported models are in the directory set by config
123+
`service->model_path`, which is `exp/han-cls/service` here.
124+

docs/tutorials/training/text_class_example.md renamed to docs/tutorials/training/text_class_source_example.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
# A Text Classification Usage Example
22

3-
In this tutorial, we demonstrate a text classification task with an open source dataset: `yahoo answer`.
3+
## Intro
4+
5+
In this tutorial, we demonstrate a text classification task with an
6+
open source dataset: `yahoo answer` for users with installation from
7+
source code..
48

59
A complete process contains following steps:
610

setup.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
TF_INCLUDE = TF_INCLUDE.split('-I')[1]
1616

1717
TF_LIB_INC, TF_SO_LIB = tf.sysconfig.get_link_flags()
18-
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.1.dylib',
19-
'-ltensorflow_framework.1')
18+
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.2.dylib',
19+
'-ltensorflow_framework.2')
2020
TF_LIB_INC = TF_LIB_INC.split('-L')[1]
2121
TF_SO_LIB = TF_SO_LIB.split('-l')[1]
2222

@@ -100,7 +100,7 @@ def get_requires():
100100
description=SHORT_DESCRIPTION,
101101
long_description=LONG_DESCRIPTION,
102102
long_description_content_type="text/markdown",
103-
version="0.2",
103+
version="0.2.1",
104104
author=AUTHOR,
105105
author_email=AUTHOR_EMAIL,
106106
maintainer=MAINTAINER,

0 commit comments

Comments
 (0)