Skip to content

Parse weirdly if nested html elements (like a figure element in an anchor) are in one line #27

Not planned
@talatkuyuk

Description

@talatkuyuk

Initial checklist

  • I read the support docs
    I read the contributing guide
    I agree to follow the code of conduct
    I searched issues and discussions and couldn’t find anything (or linked relevant results below)

Affected package

hast-util-raw

Steps to reproduce

Here is a markdown input containing simple nested html elements are in one line:

const md = `<a href="https://example.com"><figure><img src="image.png" alt=""></figure></a>`;

const unified = require('unified');
const remarkParse = require('remark-parse');
const remarkRehype = require('remark-rehype');
const rehypeRaw = require('rehype-raw');
const rehypeStringify = require('rehype-stringify');

const html = unified()
  .use(remarkParse)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeStringify)
  .processSync(md);

console.log(html);

Actual behavior

If the html elements are in one line, it produces weird anchor behavior and empty paragraph at the end:

<p><a href="https://example.com"></a></p><figure><a href="https://example.com"><img src="image.png" alt=""></a></figure><p></p>

But if the input is like:

const md = `<a href="https://example.com">
<figure><img src="image.png" alt=""></figure>
</a>`;

it behaves normal and produces the expected result.

Expected behavior

I expect it can handle this kind of simple nested html input in one line and the result to be:

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

I mean hast-util-raw should handle nested html elements even if they are in one line.

Runtime

node

Package manager

npm

Operating system

macos

Build and bundle tools

No response

Activity

added
👋 phase/newPost is being triaged automatically
🤞 phase/openPost is being triaged manually
and removed
👋 phase/newPost is being triaged automatically
on Apr 18, 2025
wooorm

wooorm commented on Apr 18, 2025

@wooorm
Member

If the html elements are in one line, it produces weird anchor behavior and empty paragraph at the end:

Hi! This has to do with markdown. Not with HTML, or this package. Here’s your input pasted on the CommonMark dingus: https://spec.commonmark.org/dingus/?text=%3Ca%20href%3D%22https%3A%2F%2Fexample.com%22%3E%3Cfigure%3E%3Cimg%20src%3D%22image.png%22%20alt%3D%22%22%3E%3C%2Ffigure%3E%3C%2Fa%3E%0A%0A%3Ca%20href%3D%22https%3A%2F%2Fexample.com%22%3E%0A%3Cfigure%3E%3Cimg%20src%3D%22image.png%22%20alt%3D%22%22%3E%3C%2Ffigure%3E%0A%3C%2Fa%3E. You should be able to see the same here on GH too.

github-actions

github-actions commented on Apr 18, 2025

@github-actions
added
👎 phase/noPost cannot or will not be acted on
and removed
🤞 phase/openPost is being triaged manually
on Apr 18, 2025
talatkuyuk

talatkuyuk commented on Apr 18, 2025

@talatkuyuk
Author

Here’s your input pasted on the CommonMark dingus:

When I go to dingus, I see the input pasted; then, when I click the HTML tab in the right panel, I see the HTML result for that is in one line:

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

It is expected output as should be (not weird one). Am I doing wrong? or am I right about the issue?

wooorm

wooorm commented on Apr 19, 2025

@wooorm
Member

It’s about the <p>s being added. There is a difference between the two test cases

wooorm

wooorm commented on Apr 19, 2025

@wooorm
Member

Perhaps I am unclear what you mean. Can you maybe make your input/actual/expected examples smaller?

talatkuyuk

talatkuyuk commented on Apr 19, 2025

@talatkuyuk
Author

It is not about <p> is being added or not. It is about anchor <a> behavior if it is outer. Here are two input/actual/expected examples in a simplest way. Consider the setup is like below:

const unified = require('unified');
const remarkParse = require('remark-parse');
const remarkRehype = require('remark-rehype');
const rehypeRaw = require('rehype-raw');
const rehypeStringify = require('rehype-stringify');

const html = unified()
  .use(remarkParse)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeStringify)
  .processSync(input);

input markdown (in one line, outer <a> inner <figure>):

<a href="https://example.com"><figure><img src="image.png" alt=""></figure></a>

actual output (weird, empty anchor within <p> in the beginning, and empty paragraph at the end):

<p><a href="https://example.com"></a></p><figure><a href="https://example.com"><img src="image.png" alt=""></a></figure><p></p>

expected output (I saw the expected output in dingus HTML tab, and hast-util-raw should ensure that result !):

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

On the other hand, I changed the order of nesting elements to see the behavior:

input markdown (in one line, outer <figure> inner <a>):

<figure><a href="https://example.com"><img src="image.png" alt=""></a></figure>

actual output (not weird, it is expected, and no problem !):

<figure><a href="https://example.com"><img src="image.png" alt=""></a></figure>

I stress that the two inputs are in one line, just changed the order of nesting.

wooorm

wooorm commented on Apr 21, 2025

@wooorm
Member

Thanks for providing more info! I made your example smaller:

/**
 * @import {Root} from 'hast'
 */

import {raw} from 'hast-util-raw'

/** @type {Root} */
const tree = {
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'p',
      properties: {},
      children: [
        {type: 'raw', value: '<a>'},
        {type: 'raw', value: '<figure>'},
        {type: 'raw', value: '</figure>'},
        {type: 'raw', value: '</a>'}
      ]
    }
  ]
}

const reformatted = raw(tree)

console.dir(reformatted, {depth: null})

Yields:

{
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'p',
      properties: {},
      children: [
        { type: 'element', tagName: 'a', properties: {}, children: [] }
      ]
    },
    {
      type: 'element',
      tagName: 'figure',
      properties: {},
      children: []
    },
    { type: 'element', tagName: 'p', properties: {}, children: [] }
  ],
  data: { quirksMode: false }
}

Perhaps this smaller example will make it more visible: the “problem” is that there is a <figure> inside a <p>. That cannot be. When <figure> is seen, the a is first closed, and the p is closed. Then the figure is opened, closed, the stray </a> is ignored, and the stray </p> first causes it to be opened and then immediately closed.

You can see the same behavior in a browser by pasting this in an empty new tab: document.body.innerHTML = '<p><a><figure></figure></a></p>'. And then inspecting the DOM

talatkuyuk

talatkuyuk commented on May 4, 2025

@talatkuyuk
Author

Thank you @wooorm; you are right.

input result
a <figure> inside <a> in <p> <p><a><figure><img></figure></a></p> I saw the problem in chrome as you said
a <figure> inside just <a> <a><figure><img></figure></a> no problem

The issue is related with default html parsing behavior. Thanks again. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    👀 no/externalThis makes more sense somewhere else👎 phase/noPost cannot or will not be acted on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @wooorm@talatkuyuk

        Issue actions

          Parse weirdly if nested html elements (like a figure element in an anchor) are in one line · Issue #27 · syntax-tree/hast-util-raw