You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-14
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ The scripts assume a corpus format of one sentence per line in UTF-8 encoded (op
64
64
| Count |`representations/count.py`| VSM ||
65
65
| PPMI |`representations/ppmi.py`| VSM ||
66
66
| SVD |`representations/svd.py`| VSM ||
67
-
| RI |`representations/ri.py`| VSM |- use `-a` for good performance|
67
+
| RI |`representations/ri.py`| VSM ||
68
68
| SGNS |`representations/sgns.py`| VSM ||
69
69
| SCAN |[repository](https://github.com/ColiLea/scan)| TPM | - different corpus input format |
70
70
@@ -75,7 +75,7 @@ Table: VSM=Vector Space Model, TPM=Topic Model
75
75
|Name | Code | Applicability | Comment |
76
76
| --- | --- | --- | --- |
77
77
| CI |`alignment/ci_align.py`| Count, PPMI ||
78
-
| SRV |`alignment/srv_align.py`| RI | - use `-a` for good performance <br> - consider using the efficient and more powerful [TRIPY](https://github.com/Garrafao/TRIPY)|
78
+
| SRV |`alignment/srv_align.py`| RI | - consider using more powerful [TRIPY](https://github.com/Garrafao/TRIPY)|
79
79
| OP |`alignment/map_embeddings.py`| SVD, RI, SGNS | - drawn from [VecMap](https://github.com/artetxem/vecmap) <br> - for OP- and OP+ see `scripts/`|
80
80
| VI |`alignment/sgns_vi.py`| SGNS | - bug fixes 27/12/19 (see script for details) |
81
81
| WI |`alignment/wi.py`| Count, PPMI, SVD, RI, SGNS | - consider using the more advanced [Temporal Referencing](https://github.com/Garrafao/TemporalReferencing)|
@@ -99,11 +99,11 @@ Find detailed notes on model performances and optimal parameter settings in [the
99
99
100
100
The evaluation framework of this repository is based on the comparison of a set of target words across two corpora. Hence, models can be evaluated on a triple (dataset, corpus1, corpus2), where the dataset provides gold values for the change of target words between corpus1 and corpus2.
101
101
102
-
| Dataset | Corpus 1 | Corpus 2 | Download | Comment |
103
-
| --- | --- | --- | --- | --- |
104
-
| DURel | DTA18 | DTA19 |[Dataset](https://www.ims.uni-stuttgart.de/data/durel), [Corpora](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/wocc)| - version from Schlechtweg et al. (2019) at `testsets/durel/`|
105
-
| SURel | SDEWAC | COOK |[Dataset](https://www.ims.uni-stuttgart.de/data/surel), [Corpora](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/wocc)| - version from Schlechtweg et al. (2019) at `testsets/surel/`|
| DURel |German |DTA18 | DTA19 |[Dataset](https://www.ims.uni-stuttgart.de/data/durel), [Corpora](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/wocc)| - version from Schlechtweg et al. (2019) at `testsets/durel/`|
105
+
| SURel |German |SDEWAC | COOK |[Dataset](https://www.ims.uni-stuttgart.de/data/surel), [Corpora](https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/wocc)| - version from Schlechtweg et al. (2019) at `testsets/surel/`|
We provide several evaluation pipelines, downloading the corpora and evaluating the models on the above-mentioned datasets, see [pipeline](#pipeline).
109
109
@@ -140,6 +140,7 @@ As is the scripts will reproduce the results from Schlechtweg et al. (2019) and
140
140
141
141
- September 1, 2019: Python scripts were updated from Python 2 to Python 3.
142
142
- December 27, 2019: bug fixes in `alignment/sgns_vi.py` (see script for details)
143
+
- March 23, 2020: updates in `representations/ri.py` and `alignment/srv_align.py` (see scripts for details)
143
144
144
145
### Error Sources
145
146
@@ -153,19 +154,21 @@ BibTex
153
154
title = {{A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains}},
154
155
author = {Dominik Schlechtweg and Anna H\"{a}tty and Marco del Tredici and Sabine {Schulte im Walde}},
155
156
booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
156
-
year = {2019},
157
-
address = {Florence, Italy},
158
-
publisher = {Association for Computational Linguistics},
159
-
pages = {732--746}
157
+
year = {2019},
158
+
address = {Florence, Italy},
159
+
publisher = {Association for Computational Linguistics},
160
+
pages = {732--746},
161
+
doi = {10.18653/v1/P19-1072}
160
162
}
161
163
```
162
164
```
163
165
@inproceedings{SchlechtwegWalde20,
164
166
title = {{Simulating Lexical Semantic Change from Sense-Annotated Data}},
165
167
author = {Dominik Schlechtweg and Sabine {Schulte im Walde}},
166
168
year = {2020}
167
-
booktitle = {{The Evolution of Language: Proceedings of the 13th International Conference (EVOLANGXIII)}},
168
-
editor = {C. Cuskley and M. Flaherty and H. Little and Luke McCrohon and A. Ravignani and T. Verhoef},
169
-
publisher = {Online at {}},
169
+
booktitle = {{The Evolution of Language: Proceedings of the 13th International Conference (EvoLang13)}},
170
+
editor = {Ravignani, A. and Barbieri, C. and Martins, M. and Flaherty, M. and Jadoul, Y. and Lattenkamp, E. and Little, H. and Mudd, K. and Verhoef, T.},
<seeds> = number of non-zero values in each random vector
26
25
<matrixPath1> = path to matrix1
27
26
<matrixPath2> = path to matrix2
28
27
<outPath1> = output path for aligned space 1
29
28
<outPath2> = output path for aligned space 2
30
-
<outPathElement> = output path for elemental space (context vectors)
31
29
<dim> = number of dimensions for random vectors
32
-
<t> = threshold for downsampling (if t=None, no subsampling is applied)
33
30
34
31
Options:
35
32
-l, --len normalize final vectors to unit length
36
-
-s, --see specify number of seeds manually
37
-
-a, --aut calculate number of seeds automatically as proposed in [1,2]
33
+
34
+
Note:
35
+
Assumes intersected and ordered columns. Paramaters -s, -a and <t> have been removed from an earlier version for efficiency. Also columns are now intersected instead of unified.
new_columns1=csc_matrix((len(rows1),len(columns_diff1))) # Get empty columns for additional context words
108
-
unified_matrix1=csc_matrix(hstack((matrix1,new_columns1)))[:,sorted(cj2i1, key=cj2i1.get)] # First concatenate matrix and empty columns and then order columns according to unified_columns
0 commit comments