Skip to content

HTTP to HTTPS #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion APIExample.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,7 @@ print 'Tesseract-ocr version', tesseract_version
print result_text
```

Example of passing python file object to C-API can be found at [pastebin](http://pastebin.com/yDTkNfNm).
Example of passing python file object to C-API can be found at [pastebin](https://pastebin.com/yDTkNfNm).

Example of extracting orientation from Tesseract 4.0:

Expand Down
78 changes: 39 additions & 39 deletions AddOns.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,21 @@ Platform support depends on used language and experience of user.

#### Box file editors

[jTessBoxEditor](http://vietocr.sourceforge.net/training.html)
[jTessBoxEditor](https://vietocr.sourceforge.net/training.html)

### For Tesseract 3.0x

#### Box file editors

| **Name** | **Last update** | **Language** | Multipage support |
|:---------|:----------------|:-------------|:------------------|
| [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) | 2023 | Java | yes |
| [QT Box Editor](http://zdenop.github.com/qt-box-editor/) | 2019 | C++, Qt4/Qt5 | yes |
| [jTessBoxEditor](https://vietocr.sourceforge.net/training.html) | 2023 | Java | yes |
| [QT Box Editor](https://zdenop.github.com/qt-box-editor/) | 2019 | C++, Qt4/Qt5 | yes |
| [tesseract-box-editor](https://github.com/scotts48/tesseract-box-editor) | 2013 | .NET 4 | yes |
| [Tesseract-OCR boxfile AJAX editor](http://pp19dd.com/tesseract-ocr-chopper/) | 2012 | online tool |
| [cowboxer](http://code.google.com/p/cowboxer/) | 2012 | C++, Qt4 | no |
| [moshPyTT ](http://code.google.com/p/moshpytt/) | 2011 | Python, GTK2 | no |
| [pytesseracttrainer](http://code.google.com/p/pytesseracttrainer/) | 2011 | Python, GTK2 | no |
| [Tesseract-OCR boxfile AJAX editor](https://pp19dd.com/tesseract-ocr-chopper/) | 2012 | online tool |
| [cowboxer](https://code.google.com/p/cowboxer/) | 2012 | C++, Qt4 | no |
| [moshPyTT ](https://code.google.com/p/moshpytt/) | 2011 | Python, GTK2 | no |
| [pytesseracttrainer](https://code.google.com/p/pytesseracttrainer/) | 2011 | Python, GTK2 | no |


### For Tesseract-OCR 2.0x
Expand All @@ -34,36 +34,36 @@ Platform support depends on used language and experience of user.

| **Name** | **Last update** | **Language** |
|:---------|:----------------|:-------------|
| [Tesseract-OCR boxfile AJAX editor](http://pp19dd.com/tesseract-ocr-chopper/) | 2012 | online tool |
| [owlboxer](http://code.google.com/p/owlboxer/) | 2010 | C++, Qt4 |
| [Tessboxer](http://sites.google.com/site/spilkaondrej) | 2009 | .NET |
| [boxfilereader.php](http://tesseract-ocr.googlecode.com/files/boxfilereader.php) | 2009 | php |
| [tessboxes](http://www.lbreyer.com/tessboxes.html) | 2008 | C |
| [JTesseract](http://code.google.com/p/jtesseract/) | 2008 | C# |
| [wx-tetra](http://code.google.com/p/wx-tetra/) | 2008 | perl, wx |
| [bbtesseract](http://code.google.com/p/bbtesseract/) | 2008 | VB.NET 2008 |
| [Tesseract-OCR boxfile AJAX editor](https://pp19dd.com/tesseract-ocr-chopper/) | 2012 | online tool |
| [owlboxer](https://code.google.com/p/owlboxer/) | 2010 | C++, Qt4 |
| [Tessboxer](https://sites.google.com/site/spilkaondrej) | 2009 | .NET |
| [boxfilereader.php](https://tesseract-ocr.googlecode.com/files/boxfilereader.php) | 2009 | php |
| [tessboxes](https://www.lbreyer.com/tessboxes.html) | 2008 | C |
| [JTesseract](https://code.google.com/p/jtesseract/) | 2008 | C# |
| [wx-tetra](https://code.google.com/p/wx-tetra/) | 2008 | perl, wx |
| [bbtesseract](https://code.google.com/p/bbtesseract/) | 2008 | VB.NET 2008 |


## Other Training Tools

* [jTessBoxEditor](http://vietocr.sourceforge.net/training.html) - Box Editor and Training Tool
* [jTessBoxEditor](https://vietocr.sourceforge.net/training.html) - Box Editor and Training Tool

* [MzTesseract](https://github.com/mazluta/MzTesseract) - MS Windows program that can train new language from top to bottom
* [FrankenPlus](https://github.com/this-is-ari/python-tesseract-3.02-training) - tool for creating font training for Tesseract OCR engine from page images. More information about Franken+ is at at [IT'S ALIVE!](http://emop.tamu.edu/node/54Franken+:) and [Franken+ homepage](http://dh-emopweb.tamu.edu/Franken+/).
* [FrankenPlus](https://github.com/this-is-ari/python-tesseract-3.02-training) - tool for creating font training for Tesseract OCR engine from page images. More information about Franken+ is at at [IT'S ALIVE!](https://emop.tamu.edu/node/54Franken+:) and [Franken+ homepage](http://dh-emopweb.tamu.edu/Franken+/).
* [python-tesseract-3.02-training](https://github.com/this-is-ari/python-tesseract-3.02-training) - script to automate the generation of Tesseract 3.02 training files
* [tesseract-box-file](https://code.google.com/p/tesseract-box-file/) - autoit script to make editing the box file easier
* [Serak Tesseract Trainer for Tesseract 3.02](https://code.google.com/p/serak-tesseract-trainer/) - a front end GUI for training tesseract 3.02
* [BoxMaker](http://reza1615.github.com/index.html) is online tool for generating image&box pair. Offline version is available in download section of [PersianOCR project](https://github.com/reza1615/PersianOcr/downloads)
* [boxFactory](http://www.dinosaursandmoustaches.com/boxFactory.php) is a tool for quickly creating box files to train the Tesseract OCR engine. You can identify characters in the image by simply drawing boxes around them.
* [BoxMaker](https://reza1615.github.com/index.html) is online tool for generating image&box pair. Offline version is available in download section of [PersianOCR project](https://github.com/reza1615/PersianOcr/downloads)
* [boxFactory](https://www.dinosaursandmoustaches.com/boxFactory.php) is a tool for quickly creating box files to train the Tesseract OCR engine. You can identify characters in the image by simply drawing boxes around them.
* https://github.com/BaltoRouberol/TesseractTrainer - TesseractTrainer is a simple Python API, taking over the tedious process of manually training Tesseract3
* [tess\_school](https://github.com/ddohler/tess_school) - a set of handy scripts to make the tesseract training process a bit easier
* [txt2img](http://code.google.com/p/txt2img/) - Qt GUI application that generates image and box file based on text input
* [DangAmbigs Generator](http://www.cs.toronto.edu/~mreimer/tesseract.html) - Creates a DangAmbigs file automatically given a set of OCR text output and correct text. _Requirements:_ Python
* [train.ps1](http://sourceforge.net/p/vietocr/code/HEAD/tree/jTessBoxEditor/trunk/tools/) - Windows powershell script for Automate Tesseract 3.01 language data pack generation process.
* [Update unicharambigs.exe](http://code.google.com/p/tesseract-ocr/issues/detail?id=544) - A small (windows) C# program for editing "lang.unicharambigs" file
* [train\_tess.pl](http://code.google.com/p/tesseract-ocr/issues/detail?id=640) - perl script to facilitate training
* [txt2img](https://code.google.com/p/txt2img/) - Qt GUI application that generates image and box file based on text input
* [DangAmbigs Generator](https://www.cs.toronto.edu/~mreimer/tesseract.html) - Creates a DangAmbigs file automatically given a set of OCR text output and correct text. _Requirements:_ Python
* [train.ps1](https://sourceforge.net/p/vietocr/code/HEAD/tree/jTessBoxEditor/trunk/tools/) - Windows powershell script for Automate Tesseract 3.01 language data pack generation process.
* [Update unicharambigs.exe](https://code.google.com/p/tesseract-ocr/issues/detail?id=544) - A small (windows) C# program for editing "lang.unicharambigs" file
* [train\_tess.pl](https://code.google.com/p/tesseract-ocr/issues/detail?id=640) - perl script to facilitate training
* [boxedit](https://github.com/danvk/boxedit/) - A web-based editor for Tesseract box files
* [TrainYourTesseract](http://trainyourtesseract.com) - Free online "no-hassle" TTF file to trainedata converter
* [TrainYourTesseract](https://trainyourtesseract.com) - Free online "no-hassle" TTF file to trainedata converter


## Community training projects
Expand All @@ -72,20 +72,20 @@ Platform support depends on used language and experience of user.
* **MRZ**: https://groups.google.com/group/tesseract-ocr/attach/10d7c711c9cc80/mrz.traineddata
* **Latin**: https://github.com/ryanfb/latinocr-lattraining
* **tesseract-georgian**: https://github.com/ddohler/tesseract-georgian
* **Polish Fraktur**: training as [result of the IMPACT project](http://dl.psnc.pl/activities/projekty/impact/results/), [trained dataset](http://dl.psnc.pl/download/tesseract_traineddata.zip)
* **Ancient Greek**: http://ancientgreekocr.org
* **Indic**: http://code.google.com/p/tesseractindic/, https://github.com/debayan/Tesseract-Indic-OCR/, http://code.google.com/p/parichit/ (All are Obsolete)
* **Indic-OCR** http://indic-ocr.github.io/tessdata/
* **Polish Fraktur**: training as [result of the IMPACT project](https://dl.psnc.pl/activities/projekty/impact/results/), [trained dataset](http://dl.psnc.pl/download/tesseract_traineddata.zip)
* **Ancient Greek**: https://ancientgreekocr.org
* **Indic**: https://code.google.com/p/tesseractindic/, https://github.com/debayan/Tesseract-Indic-OCR/, http://code.google.com/p/parichit/ (All are Obsolete)
* **Indic-OCR** https://indic-ocr.github.io/tessdata/
* **Irish uncial**: https://github.com/jimregan/tesseract-gle-uncial
* **Polish**: http://code.google.com/p/tesseract-polish/
* **Polish**: https://code.google.com/p/tesseract-polish/
* **Fraktur** (dan, deu, swe): https://github.com/paalberti/tesseract-dan-fraktur
* **Myanmar**: http://code.google.com/p/myaocr/
* **Myanmar**: https://code.google.com/p/myaocr/
* **Persian (Farsi)**: https://github.com/reza1615/PersianOcr
* **7 segments font**: https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital

## Ports

* [Project Naptha](http://projectnaptha.com/)
* [Project Naptha](https://projectnaptha.com/)
* [tesseract.js-core](https://github.com/naptha/tesseract.js-core) - Emscripten port of Tesseract C++ API
* [tesseract.js](https://github.com/naptha/tesseract.js) - Pure Javascript OCR

Expand All @@ -94,7 +94,7 @@ Platform support depends on used language and experience of user.
### Tesseract 4.0x

**Java**
* [tess4j](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/
* [tess4j](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - https://tess4j.sourceforge.net/
* [bytedeco](https://github.com/bytedeco/javacpp-presets/tree/master/tesseract) - Java configuration and interface classes for Tesseract based on the [JavaCPP-Presets](https://github.com/bytedeco/javacpp-presets) library from https://bytedeco.org

**Python**
Expand Down Expand Up @@ -143,7 +143,7 @@ Platform support depends on used language and experience of user.
* [tesseract-sip](https://github.com/virtuald/python-tesseract-sip) - A python SIP wrapper for libtesseract (Apache license)
* [pytesseract](https://github.com/madmaze/pytesseract) - a wrapper class for Tesseract OCR (requires tesseract executable)
* [python-tesseract](https://github.com/cookbrite/python-tesseract/commits/master) - A wrapper class for Tesseract OCR that allows any conventional image files (SWIG based)
* http://code.google.com/p/pytess/ - A simple SWIG-based interface to Tesseract
* https://code.google.com/p/pytess/ - A simple SWIG-based interface to Tesseract
* [aiopytesseract](https://github.com/amenezes/aiopytesseract) - asyncio tesseract wrapper for Tesseract-OCR.

**R**
Expand All @@ -155,7 +155,7 @@ Platform support depends on used language and experience of user.

**Java**
* [bytedeco](https://github.com/bytedeco/javacpp-presets/tree/master/tesseract) - Java configuration and interface classes for Tesseract based on 'JavaCPP-Presets' library from https://bytedeco.org - https://github.com/bytedeco/javacpp-presets
* [tess4j](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/
* [tess4j](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - https://tess4j.sourceforge.net/

**Node.js**
* [penteract](https://github.com/kaelzhang/node-penteract) - The native node.js bindings to the Tesseract OCR project.
Expand All @@ -176,11 +176,11 @@ Platform support depends on used language and experience of user.
### Tesseract 2.0x

**Python**
* http://code.google.com/p/pytesser/
* http://code.google.com/p/tesseract-python (pytesser clone)
* https://code.google.com/p/pytesser/
* https://code.google.com/p/tesseract-python (pytesser clone)

**.NET**
* http://www.pixel-technology.com/freeware/tessnet2/
* https://www.pixel-technology.com/freeware/tessnet2/

**Java**
* [tess4j (0.4)](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - http://tess4j.sourceforge.net/
* [tess4j (0.4)](https://github.com/nguyenq/tess4j) - JNA wrapper. Docs and discussions - https://tess4j.sourceforge.net/
4 changes: 2 additions & 2 deletions Command-Line-Usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,8 @@ Partial Output
```
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
"https://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="https://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
Expand Down
24 changes: 12 additions & 12 deletions Compiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ The following instructions are for building on Linux, which also can be applied
* A compiler for C and C++: GCC or Clang
* GNU Autotools: autoconf, automake, libtool
* pkg-config
* [Leptonica](http://www.leptonica.org/)
* [Leptonica](https://www.leptonica.org/)
* (optional) zlib, libpng, libjpeg, libtiff, giflib, openjpeg, webp, archive, curl


Expand Down Expand Up @@ -66,16 +66,16 @@ sudo apt-get install libcairo2-dev

### Leptonica

You also need to install [Leptonica](http://www.leptonica.org/). Ensure that the development headers for Leptonica are installed before compiling Tesseract.
You also need to install [Leptonica](https://www.leptonica.org/). Ensure that the development headers for Leptonica are installed before compiling Tesseract.

Tesseract versions and the minimum version of Leptonica required:

**Tesseract** | **Leptonica** | **Ubuntu**
:-------------------: | :---------------------------------------: | :---------
4.00 | 1.74.2 | [Ubuntu 18.04](https://packages.ubuntu.com/bionic/tesseract-ocr)
3.05 | 1.74.0 | Must build from source
3.04 | 1.71 | [Ubuntu 16.04](http://packages.ubuntu.com/xenial/tesseract-ocr)
3.03 | 1.70 | [Ubuntu 14.04](http://packages.ubuntu.com/trusty/tesseract-ocr)
3.04 | 1.71 | [Ubuntu 16.04](https://packages.ubuntu.com/xenial/tesseract-ocr)
3.03 | 1.70 | [Ubuntu 14.04](https://packages.ubuntu.com/trusty/tesseract-ocr)
3.02 | 1.69 | Ubuntu 12.04
3.01 | 1.67 |

Expand All @@ -87,9 +87,9 @@ sudo apt-get install libleptonica-dev

**but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.**

The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in [Leptonica README](http://www.leptonica.org/source/README.html).
The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in [Leptonica README](https://www.leptonica.org/source/README.html).

Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at [Stackoverflow](http://stackoverflow.com/questions/4743233/is-usr-local-lib-searched-for-shared-libraries) is very helpful.
Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at [Stackoverflow](https://stackoverflow.com/questions/4743233/is-usr-local-lib-searched-for-shared-libraries) is very helpful.


## Installing Tesseract from Git
Expand Down Expand Up @@ -266,12 +266,12 @@ If you have Visual Studio 2015, checkout the https://github.com/peirick/VS2015_T

## 3.03rc-1

Have a look at blog [How to build Tesseract 3.03 with Visual Studio 2013](http://vorba.ch/2014/tesseract-3.03-vs2013.html).
Have a look at blog [How to build Tesseract 3.03 with Visual Studio 2013](https://vorba.ch/2014/tesseract-3.03-vs2013.html).


## 3.02

For tesseract-ocr 3.02 please follow instruction in [Visual Studio 2008 Developer Notes for Tesseract-OCR](http://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources).
For tesseract-ocr 3.02 please follow instruction in [Visual Studio 2008 Developer Notes for Tesseract-OCR](https://tesseract-ocr.googlecode.com/svn/trunk/vs2008/doc/setup.html#using-the-latest-tesseractocr-sources).


## 3.01
Expand All @@ -289,7 +289,7 @@ Windows relevant files are located in vs2008 directory (e.g. `tesseract-3.01\vs2

## Mingw+Msys

For Mingw+Msys have a look at blog [Compiling Leptonica and Tesseract-ocr with Mingw+Msys](http://www.sk-spell.sk.cx/compiling-leptonica-and-tesseract-ocr-with-mingwmsys).
For Mingw+Msys have a look at blog [Compiling Leptonica and Tesseract-ocr with Mingw+Msys](https://www.sk-spell.sk.cx/compiling-leptonica-and-tesseract-ocr-with-mingwmsys).


## Msys2
Expand All @@ -307,7 +307,7 @@ To build the tesseract-ocr release package, use PKGBUILD from https://github.com

## Cygwin

To build on Cygwin have a look at blog [How to build Tesseract on Cygwin](http://vorba.ch/2014/tesseract-cygwin.html).
To build on Cygwin have a look at blog [How to build Tesseract on Cygwin](https://vorba.ch/2014/tesseract-cygwin.html).

Tesseract as well as the training utilities for 3.04.00 onwards are available as Cygwin packages.

Expand All @@ -324,7 +324,7 @@ tesseract-training-util 3.04.01-1

## Mingw-w64

[Mingw-w64](http://mingw-w64.org/) allows building 32- or 64-bit executables for Windows.
[Mingw-w64](https://mingw-w64.org/) allows building 32- or 64-bit executables for Windows.
It can be used for native compilations on Windows,
but also for cross compilations on Linux (which are easier and faster than native compilations).
Most large Linux distributions already contain packages with the tools need for a cross build.
Expand Down Expand Up @@ -631,4 +631,4 @@ In this case you must create m4 directory (`mkdir m4`), and then rerun the above

# Miscellaneous

* [Standalone Tesseract build bash script](http://pastebin.com/VnGLHfbr)
* [Standalone Tesseract build bash script](https://pastebin.com/VnGLHfbr)
2 changes: 1 addition & 1 deletion Downloads.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Tesseract is included in most Linux distributions.

### Old Downloads

[Downloads Archive on SourceForge](http://sourceforge.net/projects/tesseract-ocr-alt/files/).
[Downloads Archive on SourceForge](https://sourceforge.net/projects/tesseract-ocr-alt/files/).
There you can find, among other files, Windows installer for the **old** version 3.02.

Currently, there is no **official** Windows installer for newer versions.
Expand Down
Loading