Skip to content

Fix typos #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ PDFSyntax is mostly made of simple functions. Example:

The Doc object is probably the only dedicated class you will need to handle. It is a black box that stores all the internal states of a document:
- content that is cached/memoized from an original file,
- modifications that add/modifiy/delete content and that are tracked as incremental updates.
- modifications that add/modify/delete content and that are tracked as incremental updates.

```Python
>>> doc
Expand Down Expand Up @@ -97,6 +97,6 @@ You then can write the modified PDF to disk. Note that the resulting file contai
## Open-Source, not Open-Contribution yet

PDFSyntax is [MIT licensed](https://github.com/desgeeko/pdfsyntax/blob/main/LICENCE) but is currently closed to contributions.
> Personal note: this is a pet projet of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised.
> Personal note: this is a pet project of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised.


4 changes: 2 additions & 2 deletions docs/README.html
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ <h2>API overview</h2>
</pre>
<p>The Doc object is probably the only dedicated class you will need to handle. It is a black box that stores all the internal states of a document:</p>
<ul><li>content that is cached/memoized from an original file,</li>
<li>modifications that add/modifiy/delete content and that are tracked as incremental updates.</li>
<li>modifications that add/modify/delete content and that are tracked as incremental updates.</li>
</ul>
<pre>&gt;&gt;&gt; doc
&lt;PDF Doc in revision 1 with 0 modified object(s)&gt;
Expand All @@ -72,7 +72,7 @@ <h2>API overview</h2>
<p></p>
<h2>Open-Source, not Open-Contribution yet</h2>
<p>PDFSyntax is <a href='https://github.com/desgeeko/pdfsyntax/blob/main/LICENCE'>MIT licensed</a> but is currently closed to contributions.</p>
<blockquote><p> Personal note: this is a pet projet of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. </p>
<blockquote><p> Personal note: this is a pet project of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. </p>
</blockquote>
<p></p>

Expand Down
4 changes: 2 additions & 2 deletions docs/api.html
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ <h3>Low-level access to object tree</h3>
<p><code>1j</code> is a complex number (!) representing indirect reference <code>1 0 R</code>. Why? Because the approach is to map PDF object types to Python basic built-in types as much as possible, and it is a concise way to show both the object number (as the imaginary part) and the generation number (as the real part). Moreover the generation is very often equal to zero, so the real part is not shown.You may think of the <code>j</code> as a "jump" to another object :)</p>
<p><code>get_object</code> gives direct access to indirect objects.</p>
<pre>&gt;&gt;&gt; #Access to document catalog, given that the trailer redirects to 1j for root
&gt;&gt;&gt; #(equivalent to catalog fonction)
&gt;&gt;&gt; #(equivalent to catalog function)
&gt;&gt;&gt; doc.get_object(1j)
{'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
</pre>
Expand All @@ -62,7 +62,7 @@ <h3>Pages</h3>
&gt;&gt;&gt; #(In this example, nothing is inherited from upper nodes)
</pre>
<p>The <code>page</code> function goes further by merging inherited attributes with local attributes of each page and giving the result in a list.</p>
<pre>&gt;&gt;&gt; #Equivalent list with computed page attribues
<pre>&gt;&gt;&gt; #Equivalent list with computed page attributes
&gt;&gt;&gt; pdf.pages(doc)
[{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j},
'/Contents': 5j,
Expand Down
4 changes: 2 additions & 2 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ You may think of the `j` as a "jump" to another object :)

```Python
>>> #Access to document catalog, given that the trailer redirects to 1j for root
>>> #(equivalent to catalog fonction)
>>> #(equivalent to catalog function)
>>> doc.get_object(1j)
{'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
```
Expand All @@ -73,7 +73,7 @@ Page index is a tree structure where attributes can be inherited from parent nod
The `page` function goes further by merging inherited attributes with local attributes of each page and giving the result in a list.

```Python
>>> #Equivalent list with computed page attribues
>>> #Equivalent list with computed page attributes
>>> pdf.pages(doc)
[{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j},
'/Contents': 5j,
Expand Down
2 changes: 1 addition & 1 deletion docs/cli.html
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ <h3><code>fonts</code></h3>
<li>Number of pages where it occurs</li>
</ul>
<h3><code>browse</code></h3>
<p>This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects.Redirect the standard output to a file that you can open in your browser:</p>
<p>This command generates HTML output that looks like the raw PDF file with additional hyperlinks and information that expose its internal structure and relations between its objects.Redirect the standard output to a file that you can open in your browser:</p>
<pre> python3 -m pdfsyntax browse file.pdf &gt; inspection_file.html
</pre>
<p>Please refer to the <a href='https://github.com/desgeeko/pdfsyntax/blob/main/docs/browse.md'>Browse article</a> for details.</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The output shows a list of fonts used in the file, with the following tabular da
- Number of pages where it occurs

### `browse`
This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects.
This command generates HTML output that looks like the raw PDF file with additional hyperlinks and information that expose its internal structure and relations between its objects.
Redirect the standard output to a file that you can open in your browser:

python3 -m pdfsyntax browse file.pdf > inspection_file.html
Expand Down
4 changes: 2 additions & 2 deletions docs/disassembler.html
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ <h3>Grep examples</h3>
</pre>
<p></p>
<h3>Columns</h3>
<p>Most of the columns are collapsable in order to save horizontal space. For example the position (<code>#2</code>) may have a maximum width of 10 digits but for small files the unecessary leading zeros are removed.</p>
<p>Most of the columns are collapsible in order to save horizontal space. For example the position (<code>#2</code>) may have a maximum width of 10 digits but for small files the unnecessary leading zeros are removed.</p>
<table><thead><tr><th> Column </th>
<th> Description </th>
</tr>
</thead>
<tbody><tr><td> <code>1</code> </td>
<td> <code>+</code> for a region with absolute positionning, <code>-</code> for a detail line (xref, /XRef, /ObjStm)</td>
<td> <code>+</code> for a region with absolute positioning, <code>-</code> for a detail line (xref, /XRef, /ObjStm)</td>
</tr>
<tr><td> <code>2</code> </td>
<td> Position, absolute (<code>+</code>) or relative (<code>-</code>)</td>
Expand Down
4 changes: 2 additions & 2 deletions docs/disassembler.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,11 +82,11 @@ Search all mentions of an indirect object, both in itself and in xref:

### Columns

Most of the columns are collapsable in order to save horizontal space. For example the position (`#2`) may have a maximum width of 10 digits but for small files the unecessary leading zeros are removed.
Most of the columns are collapsible in order to save horizontal space. For example the position (`#2`) may have a maximum width of 10 digits but for small files the unnecessary leading zeros are removed.

| Column | Description |
|--------|-------------|
| `1` | `+` for a region with absolute positionning, `-` for a detail line (xref, /XRef, /ObjStm)|
| `1` | `+` for a region with absolute positioning, `-` for a detail line (xref, /XRef, /ObjStm)|
| `2` | Position, absolute (`+`) or relative (`-`)|
| `3` | Size in bytes|
| `4` | Percentage compressed size / plain size|
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ <h4>API</h4>
<article class="f">
<img src="cli.png"/>
<h4>CLI</h4>
<p><a href="https://github.com/desgeeko/pdfsyntax/blob/main/docs/cli.md">Get quicky</a> some insights about a PDF file from the command line, like the metadata, the fonts used, the text of the document,...</p>
<p><a href="https://github.com/desgeeko/pdfsyntax/blob/main/docs/cli.md">Get quickly</a> some insights about a PDF file from the command line, like the metadata, the fonts used, the text of the document,...</p>
</article>
<article class="f">
<img src="browse.png"/>
Expand Down
16 changes: 8 additions & 8 deletions docs/introduction_pdf_syntax.html
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ <h2>Header</h2>
<p>
The first line of a PDF file is a <code>%PDF-X.Y</code> header.
These numbers indicate the version of the specification the file complies to.
When the numbers are high, "modern" features may be used, but this is not an obligation because of PDF backward compatibily.
When the numbers are high, "modern" features may be used, but this is not an obligation because of PDF backward compatibility.
For example, a PDF 1.2 document is also a valid PDF 1.7 document.
</p>
<p>
Expand Down Expand Up @@ -340,7 +340,7 @@ <h2>Indirect Objects</h2>
</svg>
</div>
<p>
In the previous example, indirect object #7 contains a payload that is a dictionnary defining 5 key-value pairs.
In the previous example, indirect object #7 contains a payload that is a dictionary defining 5 key-value pairs.
The following section is here to describe most of the object types.
</p>
</section>
Expand All @@ -361,17 +361,17 @@ <h2>Object Types</h2>
<p>
And there are collection types :
<ul>
<li><em>Array</em> : an ordered list of atomic objects written bewteen brackets, like <code>[true 800 (ABC) /Something]</code>,</li>
<li><em>Array</em> : an ordered list of atomic objects written between brackets, like <code>[true 800 (ABC) /Something]</code>,</li>
<li><em>Dictionary</em> : a map / associative array of unordered key-value pairs;
all keys must be names, and the object is enclosed in double angle brackets like <code>&lt;&lt; /Key1 (Value1) /Key2 (Value2) &gt;&gt;</code>;
Note that the same separator (for example space or carriage return) may occur bewteen a key and a value and bewteen distinct pairs:
Note that the same separator (for example space or carriage return) may occur between a key and a value and between distinct pairs:
a parser needs to keep a context in order to determine if the next token is a key or a value.</li>
</ul>
</p>
<p>
And there is a composite type for content :
<ul>
<li><em>Stream</em> : a dictionnary immediately followed by a sequence of bytes enclosed bewteen the <code>stream</code> and <code>endstream</code> keywords;
<li><em>Stream</em> : a dictionary immediately followed by a sequence of bytes enclosed between the <code>stream</code> and <code>endstream</code> keywords;
It typically conveys either a sequence of commands that write content on a page or a blob used in a sequence of commands (font file, image).</li>
</ul>
</p>
Expand All @@ -398,7 +398,7 @@ <h2>Filters</h2>
</div>
<p>
But very often some filter modifies the bytes sequence. A filter may compress the data or encode it, and several may be chained to form a pipeline.
For example a stream dictionnary containing <code>/Filter [/ASCII85Decode /FlateDecode]</code> (besides the mandatory <code>/Length</code> attribute)
For example a stream dictionary containing <code>/Filter [/ASCII85Decode /FlateDecode]</code> (besides the mandatory <code>/Length</code> attribute)
should be decoded from ASCII Base85 into binary and then decompressed with the deflate algorithm.

</p>
Expand All @@ -413,7 +413,7 @@ <h2>Cross-Reference Stream</h2>
<li>and the stream content contains a structure specifying the location of indirect objects</li>
</ul>
<p>
This mecanism adds a feature that was not possible with Cross-Reference tables where all objects are accessed with file offset in bytes:
This mechanism adds a feature that was not possible with Cross-Reference tables where all objects are accessed with file offset in bytes:
an indirect object may be located inside another indirect object.
In that case the terminology says that the container is an Object Stream that contains compressed objects.
</p>
Expand Down Expand Up @@ -456,7 +456,7 @@ <h2>Document Structure</h2>
<h2>Incremental Updates</h2>
<p>It is possible to build a new revision of a document without writing a whole new file: changes are appended to the original file.
Changes consist in new or modified objects, a Cross-reference, and a <code>startxref</code> that points to it.
The Cross-Reference (either its trailer or its stream dictionary) contains a <code>/Prev</code> attribute thats links the new revision to the original Cross-Reference.
The Cross-Reference (either its trailer or its stream dictionary) contains a <code>/Prev</code> attribute that's links the new revision to the original Cross-Reference.
</p>
<div class="centered">
<svg viewBox="0 1360 600 930">
Expand Down
4 changes: 2 additions & 2 deletions pdfsyntax/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,9 +273,9 @@ def build_text_fragments(page_contents: list, f: list):
"""List all text fragmemts that are part of a page, with their coordinates.

Each list item is another list made of:
- the intial transformation matrix
- the initial transformation matrix
- the text
- the final tranformation matrix
- the final transformation matrix
"""
tfs = []
gs = []
Expand Down
2 changes: 1 addition & 1 deletion pdfsyntax/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def encode_stream(stream, stream_def):


def asciihex(stream, columns = None):
"""ASCIIHex encoder augmented with a beautifier (colums and newlines) for DEBUG ONLY."""
"""ASCIIHex encoder augmented with a beautifier (columns and newlines) for DEBUG ONLY."""
if columns is None:
return (binascii.hexlify(stream)).upper()
else:
Expand Down
4 changes: 2 additions & 2 deletions pdfsyntax/markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def parse_markdown(lines: list, start_pos = 0, start_indent = 0) -> tuple:


def tags(string: str) -> str:
"""Tranform both style & links."""
"""Transform both style & links."""
return style(link(string))


Expand All @@ -269,7 +269,7 @@ def entities(string: str) -> str:


def assemble_html(blocks: list, html = '') -> str:
"""Recusively build HTML string for parsed markdown."""
"""Recursively build HTML string for parsed markdown."""
for typ, items in blocks:
html += f"<{typ.lower()}>"
for x in items:
Expand Down