Right now, Parsoid tries to mimic PHP parser + Tidy p-wrapping semantics but doesn't fully get it right.
- See T109650.
- Parsoid currently leaves behind bare text in <blockquote> in some scenarios (where that text showed up on the same wikitext line as the <blockquote> or </blockquote> tags). However, Tidy wraps these text nodes in <p> tags (probably a HTML4 behavior). In HTML5, it is no longer necessary to wrap text nodes in p-tags. However, since line-based processing in doBlockLevels in the PHP parser will introduce p-tags in some scenarios but not others, it is likely we will want <p> tags wrapping all text content in these tags (blockquote, td, th).
So, once T89331 is resolved, we should rip out the block-tag behavior (meant to mimic Tidy) out of the p-wrapping token transformer and instead introduce a DOM pass that addresses bare text found in the DOM (either as children of <body> or in "block" nodes such as <blockquote>, <td>, <th>).
If implemented in the PHP parser as well, this can lead to identical output in both the PHP parser and Parsoid. This could also remove odd behavior currently found in the output of PHP+Tidy combo where wikitext such as "foo\n\nbar\n\nbaz" found in a <td> will result in foo being bare text, but bar and baz will be wrapped in p-tags (which is basically why PHP parser and Parsoid have different output for T109650).
However, independent of what we do with the PHP parser, it makes sense to do some of the p-wrapping as a post-html-generation dom pass inside Parsoid to make p-wrapping more consistent.