Skip to content

Commit d4e076d

Browse files
Add filter latex-hyphen (#95)
* Add filter latex-hyphen. * Hard wrap lines in README.md * Move format check to the top. * Remove superfluous space in function definition. * Return early when string contains no hyphen. * Add header with description, copyright, license.
1 parent a5dfc85 commit d4e076d

File tree

7 files changed

+155
-0
lines changed

7 files changed

+155
-0
lines changed

latex-hyphen/Makefile

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
DIFF ?= diff --strip-trailing-cr -u
2+
3+
test:
4+
@pandoc --lua-filter=latex-hyphen.lua --output=output.tex sample.md
5+
@$(DIFF) expected.tex output.tex
6+
@rm -f output.tex
7+
8+
.PHONY: test

latex-hyphen/README.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# latex-hyphen.lua
2+
3+
`latex-hyphen.lua` is a [pandoc](https://pandoc.org/) filter that replaces
4+
intra-word hyphens with the raw LaTeX expression `"=` for improved
5+
hyphenation.
6+
7+
## Purpose
8+
9+
The regular hyphen `-` prevents LaTeX from breaking a word at any other
10+
position than the explicit hyphen. With long, hyphenated words as they occur
11+
in languages like German, this can lead to undesirable visual results. The
12+
expression `"=` outputs a normal hyphen while still allowing LaTeX to break
13+
the word at any other position according to its regular hyphenation rules.
14+
15+
Before:
16+
17+
![](without-filter.png)
18+
19+
After:
20+
21+
![](with-filter.png)
22+
23+
## Usage
24+
25+
For this to work, babel shorthands have to be activated. With XeLaTeX or
26+
LuaTeX as PDF engine, this can be done using the YAML frontmatter:
27+
28+
```yaml
29+
polyglossia-lang:
30+
name: german
31+
options:
32+
- spelling=new,babelshorthands=true
33+
34+
```
35+
36+
For pdflatex, a custom template has to be used, as the built-in template
37+
explicitly deactivates babel’s shorthands.
38+
39+
The filter can then be called like this:
40+
41+
```sh
42+
pandoc -o mydoc.pdf --pdf-engine xelatex \
43+
--lua-filter latex-hyphen.lua mydoc.md
44+
```
45+
46+
## Caveat
47+
48+
pandoc strips LaTeX expressions from PDF strings like bookmarks. Thus, the
49+
document outline in the resulting PDF file will lack any hyphens.

latex-hyphen/expected.tex

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
\hypertarget{das-ist-ein-silbentrennungs-test}{%
2+
\section{\texorpdfstring{Das ist ein
3+
Silbentrennungs"=Test}{Das ist ein SilbentrennungsTest}}\label{das-ist-ein-silbentrennungs-test}}
4+
5+
Sie studierte in dieser Zeit längst an der weit berühmten
6+
Karl"=Franzens"=Universität Graz.

latex-hyphen/latex-hyphen.lua

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
--- Replace intra-word hyphens with LaTeX shorthand "= for better hyphenation.
2+
--
3+
-- PURPOSE
4+
--
5+
-- The regular hyphen - prevents LaTeX from breaking a word at any other
6+
-- position than the explicit hyphen. With long, hyphenated words as they occur
7+
-- in languages like German, this can lead to undesirable visual results. The
8+
-- expression "= outputs a normal hyphen while still allowing LaTeX to break
9+
-- the word at any other position according to its regular hyphenation rules.
10+
--
11+
-- USAGE
12+
--
13+
-- For this to work, babel shorthands have to be activated. With XeLaTeX or
14+
-- LuaTeX as PDF engine, this can be done using the YAML frontmatter:
15+
16+
-- polyglossia-lang:
17+
-- name: german
18+
-- options:
19+
-- - spelling=new,babelshorthands=true
20+
--
21+
-- For pdflatex, a custom template has to be used, as the built-in template
22+
-- explicitly deactivates babel’s shorthands.
23+
--
24+
-- The filter can then be called like this:
25+
--
26+
-- pandoc -o doc.pdf --pdf-engine xelatex --lua-filter latex-hyphen.lua doc.md
27+
--
28+
-- AUTHOR
29+
--
30+
-- Copyright 2020 Frederik Elwert <frederik.elwert@rub.de>
31+
--
32+
-- LICENSE
33+
--
34+
-- Permission is hereby granted, free of charge, to any person obtaining a copy
35+
-- of this software and associated documentation files (the "Software"), to
36+
-- deal in the Software without restriction, including without limitation the
37+
-- rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
38+
-- sell copies of the Software, and to permit persons to whom the Software is
39+
-- furnished to do so, subject to the following conditions:
40+
--
41+
-- The above copyright notice and this permission notice shall be included in
42+
-- all copies or substantial portions of the Software.
43+
--
44+
-- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
45+
-- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
46+
-- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
47+
-- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
48+
-- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
49+
-- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
50+
-- IN THE SOFTWARE.
51+
52+
53+
if FORMAT ~= 'latex' then
54+
return {}
55+
end
56+
57+
function split_hyphen(inputstr)
58+
local sep = '-'
59+
local t = {}
60+
for str in string.gmatch(inputstr, '([^'..sep..']+)') do
61+
table.insert(t, str)
62+
end
63+
return t
64+
end
65+
66+
function Str(elem)
67+
local parts = split_hyphen(elem.c)
68+
-- if not more than one part, string contains no hyphen, return unchanged.
69+
if #parts <= 1 then
70+
return nil
71+
end
72+
-- otherwise, splice raw latex "= between parts
73+
local o = {}
74+
for index, part in ipairs(parts) do
75+
table.insert(o, pandoc.Str(part))
76+
if index < #parts then
77+
table.insert(o, pandoc.RawInline('latex', '"='))
78+
end
79+
end
80+
return o
81+
end

latex-hyphen/sample.md

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
lang: de
3+
polyglossia-lang:
4+
name: german
5+
options:
6+
- spelling=new,babelshorthands=true
7+
...
8+
9+
# Das ist ein Silbentrennungs-Test
10+
11+
Sie studierte in dieser Zeit längst an der weit berühmten Karl-Franzens-Universität Graz.

latex-hyphen/with-filter.png

20.2 KB
Loading

latex-hyphen/without-filter.png

20.8 KB
Loading

0 commit comments

Comments
 (0)