desgeeko · kianmeng · Feb 26, 2025
diff --git a/README.md b/README.md
@@ -59,7 +59,7 @@ PDFSyntax is mostly made of simple functions. Example:
 
 The Doc object is probably the only dedicated class you will need to handle. It is a black box that stores all the internal states of a document:
 - content that is cached/memoized from an original file,
-- modifications that add/modifiy/delete content and that are tracked as incremental updates.
+- modifications that add/modify/delete content and that are tracked as incremental updates.
 
 ```Python
 >>> doc
@@ -97,6 +97,6 @@ You then can write the modified PDF to disk. Note that the resulting file contai
 ## Open-Source, not Open-Contribution yet
 
 PDFSyntax is [MIT licensed](https://github.com/desgeeko/pdfsyntax/blob/main/LICENCE) but is currently closed to contributions.
-> Personal note: this is a pet projet of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. 
+> Personal note: this is a pet project of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. 
 
 
diff --git a/docs/README.html b/docs/README.html
@@ -50,7 +50,7 @@ <h2>API overview</h2>
 </pre>
 <p>The Doc object is probably the only dedicated class you will need to handle. It is a black box that stores all the internal states of a document:</p>
 <ul><li>content that is cached/memoized from an original file,</li>
-<li>modifications that add/modifiy/delete content and that are tracked as incremental updates.</li>
+<li>modifications that add/modify/delete content and that are tracked as incremental updates.</li>
 </ul>
 <pre>&gt;&gt;&gt; doc
 &lt;PDF Doc in revision 1 with 0 modified object(s)&gt;
@@ -72,7 +72,7 @@ <h2>API overview</h2>
 <p></p>
 <h2>Open-Source, not Open-Contribution yet</h2>
 <p>PDFSyntax is <a href='https://github.com/desgeeko/pdfsyntax/blob/main/LICENCE'>MIT licensed</a> but is currently closed to contributions.</p>
-<blockquote><p> Personal note: this is a pet projet of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. </p>
+<blockquote><p> Personal note: this is a pet project of mine and my time is limited. First I need to focus on my roadmap (new features and refactoring) and then I will happily accept contributions when everything is a little more stabilised. </p>
 </blockquote>
 <p></p>
 

diff --git a/docs/api.html b/docs/api.html
@@ -48,7 +48,7 @@ <h3>Low-level access to object tree</h3>
 <p><code>1j</code> is a complex number (!) representing indirect reference <code>1 0 R</code>. Why? Because the approach is to map PDF object types to Python basic built-in types as much as possible, and it is a concise way to show both the object number (as the imaginary part) and the generation number (as the real part). Moreover the generation is very often equal to zero, so the real part is not shown.You may think of the <code>j</code> as a "jump" to another object :)</p>
 <p><code>get_object</code> gives direct access to indirect objects.</p>
 <pre>&gt;&gt;&gt; #Access to document catalog, given that the trailer redirects to 1j for root
-&gt;&gt;&gt; #(equivalent to catalog fonction)
+&gt;&gt;&gt; #(equivalent to catalog function)
 &gt;&gt;&gt; doc.get_object(1j)
 {'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
 </pre>
@@ -62,7 +62,7 @@ <h3>Pages</h3>
 &gt;&gt;&gt; #(In this example, nothing is inherited from upper nodes)
 </pre>
 <p>The <code>page</code> function goes further by merging inherited attributes with local attributes of each page and giving the result in a list.</p>
-<pre>&gt;&gt;&gt; #Equivalent list with computed page attribues
+<pre>&gt;&gt;&gt; #Equivalent list with computed page attributes
 &gt;&gt;&gt; pdf.pages(doc)
 [{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j},
   '/Contents': 5j,

diff --git a/docs/api.md b/docs/api.md
@@ -53,7 +53,7 @@ You may think of the `j` as a "jump" to another object :)
 
 ```Python
 >>> #Access to document catalog, given that the trailer redirects to 1j for root
->>> #(equivalent to catalog fonction)
+>>> #(equivalent to catalog function)
 >>> doc.get_object(1j)
 {'/Pages': 3j, '/Outlines': 2j, '/Type': '/Catalog'}
 ```
@@ -73,7 +73,7 @@ Page index is a tree structure where attributes can be inherited from parent nod
 The `page` function goes further by merging inherited attributes with local attributes of each page and giving the result in a list.
 
 ```Python
->>> #Equivalent list with computed page attribues
+>>> #Equivalent list with computed page attributes
 >>> pdf.pages(doc)
 [{'/Resources': {'/Font': {'/F1': 7j}, '/ProcSet': 6j},
   '/Contents': 5j,

diff --git a/docs/cli.html b/docs/cli.html
@@ -38,7 +38,7 @@ <h3><code>fonts</code></h3>
 <li>Number of pages where it occurs</li>
 </ul>
 <h3><code>browse</code></h3>
-<p>This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects.Redirect the standard output to a file that you can open in your browser:</p>
+<p>This command generates HTML output that looks like the raw PDF file with additional hyperlinks and information that expose its internal structure and relations between its objects.Redirect the standard output to a file that you can open in your browser:</p>
 <pre>    python3 -m pdfsyntax browse file.pdf &gt; inspection_file.html
 </pre>
 <p>Please refer to the <a href='https://github.com/desgeeko/pdfsyntax/blob/main/docs/browse.md'>Browse article</a> for details.</p>

diff --git a/docs/cli.md b/docs/cli.md
@@ -32,7 +32,7 @@ The output shows a list of fonts used in the file, with the following tabular da
 - Number of pages where it occurs
 
 ### `browse`
-This command generates HTML output that looks like the raw PDF file with additionnal hyperlinks and information that expose its internal structure and relations between its objects.
+This command generates HTML output that looks like the raw PDF file with additional hyperlinks and information that expose its internal structure and relations between its objects.
 Redirect the standard output to a file that you can open in your browser:
 
     python3 -m pdfsyntax browse file.pdf > inspection_file.html

diff --git a/docs/disassembler.html b/docs/disassembler.html
@@ -76,13 +76,13 @@ <h3>Grep examples</h3>
 </pre>
 <p></p>
 <h3>Columns</h3>
-<p>Most of the columns are collapsable in order to save horizontal space. For example the position (<code>#2</code>) may have a maximum width of 10 digits but for small files the unecessary leading zeros are removed.</p>
+<p>Most of the columns are collapsible in order to save horizontal space. For example the position (<code>#2</code>) may have a maximum width of 10 digits but for small files the unnecessary leading zeros are removed.</p>
 <table><thead><tr><th> Column </th>
 <th> Description </th>
 </tr>
 </thead>
 <tbody><tr><td> <code>1</code>     </td>
-<td> <code>+</code> for a region with absolute positionning, <code>-</code> for a detail line (xref, /XRef, /ObjStm)</td>
+<td> <code>+</code> for a region with absolute positioning, <code>-</code> for a detail line (xref, /XRef, /ObjStm)</td>
 </tr>
 <tr><td> <code>2</code>     </td>
 <td> Position, absolute (<code>+</code>) or relative (<code>-</code>)</td>

diff --git a/docs/disassembler.md b/docs/disassembler.md
@@ -82,11 +82,11 @@ Search all mentions of an indirect object, both in itself and in xref:
 
 ### Columns
 
-Most of the columns are collapsable in order to save horizontal space. For example the position (`#2`) may have a maximum width of 10 digits but for small files the unecessary leading zeros are removed.
+Most of the columns are collapsible in order to save horizontal space. For example the position (`#2`) may have a maximum width of 10 digits but for small files the unnecessary leading zeros are removed.
 
 | Column | Description |
 |--------|-------------|
-| `1`     | `+` for a region with absolute positionning, `-` for a detail line (xref, /XRef, /ObjStm)|
+| `1`     | `+` for a region with absolute positioning, `-` for a detail line (xref, /XRef, /ObjStm)|
 | `2`     | Position, absolute (`+`) or relative (`-`)|
 | `3`     | Size in bytes|
 | `4`     | Percentage compressed size / plain size|

diff --git a/docs/index.html b/docs/index.html
@@ -26,7 +26,7 @@ <h4>API</h4>
       <article class="f">
 	<img src="cli.png"/>
         <h4>CLI</h4>
-	<p><a href="https://github.com/desgeeko/pdfsyntax/blob/main/docs/cli.md">Get quicky</a> some insights about a PDF file from the command line, like the metadata, the fonts used, the text of the document,...</p>
+	<p><a href="https://github.com/desgeeko/pdfsyntax/blob/main/docs/cli.md">Get quickly</a> some insights about a PDF file from the command line, like the metadata, the fonts used, the text of the document,...</p>
       </article>
       <article class="f">
 	<img src="browse.png"/>

diff --git a/docs/introduction_pdf_syntax.html b/docs/introduction_pdf_syntax.html
@@ -261,7 +261,7 @@ <h2>Header</h2>
             <p>
                 The first line of a PDF file is a <code>%PDF-X.Y</code> header. 
                 These numbers indicate the version of the specification the file complies to.
-                When the numbers are high, "modern" features may be used, but this is not an obligation because of PDF backward compatibily.
+                When the numbers are high, "modern" features may be used, but this is not an obligation because of PDF backward compatibility.
                 For example, a PDF 1.2 document is also a valid PDF 1.7 document.
             </p>
             <p>
@@ -340,7 +340,7 @@ <h2>Indirect Objects</h2>
                 </svg>
             </div>
             <p>
-                In the previous example, indirect object #7 contains a payload that is a dictionnary defining 5 key-value pairs.
+                In the previous example, indirect object #7 contains a payload that is a dictionary defining 5 key-value pairs.
                 The following section is here to describe most of the object types.
             </p>
     </section>
@@ -361,17 +361,17 @@ <h2>Object Types</h2>
             <p>
                 And there are collection types : 
                 <ul>
-                <li><em>Array</em> : an ordered list of atomic objects written bewteen brackets, like <code>[true 800 (ABC) /Something]</code>,</li>
+                <li><em>Array</em> : an ordered list of atomic objects written between brackets, like <code>[true 800 (ABC) /Something]</code>,</li>
                 <li><em>Dictionary</em> : a map / associative array of unordered key-value pairs; 
                                         all keys must be names, and the object is enclosed in double angle brackets like <code>&lt;&lt; /Key1 (Value1) /Key2 (Value2) &gt;&gt;</code>;
-                                    Note that the same separator (for example space or carriage return) may occur bewteen a key and a value and bewteen distinct pairs: 
+                                    Note that the same separator (for example space or carriage return) may occur between a key and a value and between distinct pairs: 
                                     a parser needs to keep a context in order to determine if the next token is a key or a value.</li>
                 </ul>
             </p>
             <p>
                 And there is a composite type for content : 
                 <ul>
-                <li><em>Stream</em> : a dictionnary immediately followed by a sequence of bytes enclosed bewteen the <code>stream</code> and <code>endstream</code> keywords; 
+                <li><em>Stream</em> : a dictionary immediately followed by a sequence of bytes enclosed between the <code>stream</code> and <code>endstream</code> keywords; 
                                     It typically conveys either a sequence of commands that write content on a page or a blob used in a sequence of commands (font file, image).</li>
                 </ul>
             </p>
@@ -398,7 +398,7 @@ <h2>Filters</h2>
             </div>
             <p>
                 But very often some filter modifies the bytes sequence. A filter may compress the data or encode it, and several may be chained to form a pipeline.
-                For example a stream dictionnary containing <code>/Filter [/ASCII85Decode /FlateDecode]</code> (besides the mandatory <code>/Length</code> attribute) 
+                For example a stream dictionary containing <code>/Filter [/ASCII85Decode /FlateDecode]</code> (besides the mandatory <code>/Length</code> attribute) 
                 should be decoded from ASCII Base85 into binary and then decompressed with the deflate algorithm. 
 
             </p>
@@ -413,7 +413,7 @@ <h2>Cross-Reference Stream</h2>
                         <li>and the stream content contains a structure specifying the location of indirect objects</li>
                         </ul>
                         <p>
-                            This mecanism adds a feature that was not possible with Cross-Reference tables where all objects are accessed with file offset in bytes: 
+                            This mechanism adds a feature that was not possible with Cross-Reference tables where all objects are accessed with file offset in bytes: 
                     an indirect object may be located inside another indirect object. 
                     In that case the terminology says that the container is an Object Stream that contains compressed objects.
                 </p>
@@ -456,7 +456,7 @@ <h2>Document Structure</h2>
         <h2>Incremental Updates</h2>
             <p>It is possible to build a new revision of a document without writing a whole new file: changes are appended to the original file.
              Changes consist in new or modified objects, a Cross-reference, and a <code>startxref</code> that points to it. 
-             The Cross-Reference (either its trailer or its stream dictionary) contains a <code>/Prev</code> attribute thats links the new revision to the original Cross-Reference. 
+             The Cross-Reference (either its trailer or its stream dictionary) contains a <code>/Prev</code> attribute that's links the new revision to the original Cross-Reference. 
     </p>
             <div class="centered">
                 <svg viewBox="0 1360 600 930">

diff --git a/pdfsyntax/api.py b/pdfsyntax/api.py
@@ -273,9 +273,9 @@ def build_text_fragments(page_contents: list, f: list):
     """List all text fragmemts that are part of a page, with their coordinates.
 
     Each list item is another list made of:
-    - the intial transformation matrix
+    - the initial transformation matrix
     - the text
-    - the final tranformation matrix
+    - the final transformation matrix
     """
     tfs = []
     gs = []

diff --git a/pdfsyntax/filters.py b/pdfsyntax/filters.py
@@ -117,7 +117,7 @@ def encode_stream(stream, stream_def):
 
 
 def asciihex(stream, columns = None):
-    """ASCIIHex encoder augmented with a beautifier (colums and newlines) for DEBUG ONLY."""
+    """ASCIIHex encoder augmented with a beautifier (columns and newlines) for DEBUG ONLY."""
     if columns is None:
         return (binascii.hexlify(stream)).upper()
     else:

diff --git a/pdfsyntax/markdown.py b/pdfsyntax/markdown.py
@@ -257,7 +257,7 @@ def parse_markdown(lines: list, start_pos = 0, start_indent = 0) -> tuple:
 
 
 def tags(string: str) -> str:
-    """Tranform both style & links."""
+    """Transform both style & links."""
     return style(link(string))
 
 
@@ -269,7 +269,7 @@ def entities(string: str) -> str:
 
 
 def assemble_html(blocks: list, html = '') -> str:
-    """Recusively build HTML string for parsed markdown."""
+    """Recursively build HTML string for parsed markdown."""
     for typ, items in blocks:
         html += f"<{typ.lower()}>"
         for x in items: