Javascript convert html to text with line breaks

Basically I just need the effect of copying that HTML from browser window and pasting it in a textarea element.

For example I want this:

Some

text
Some
text

to become this:

Some
text
Some
text

asked Sep 28, 2010 at 13:26

Danylo MysakDanylo Mysak

1,5172 gold badges16 silver badges22 bronze badges

2

If that HTML is visible within your web page, you could do it with the user selection (or just a TextRange in IE). This does preserve line breaks, if not necessarily leading and trailing white space.

UPDATE 10 December 2012

However, the toString() method of Selection objects is not yet standardized and works inconsistently between browsers, so this approach is based on shaky ground and I don't recommend using it now. I would delete this answer if it weren't accepted.

Demo: http://jsfiddle.net/wv49v/

Code:

function getInnerText(el) {
    var sel, range, innerText = "";
    if (typeof document.selection != "undefined" && typeof document.body.createTextRange != "undefined") {
        range = document.body.createTextRange();
        range.moveToElementText(el);
        innerText = range.text;
    } else if (typeof window.getSelection != "undefined" && typeof document.createRange != "undefined") {
        sel = window.getSelection();
        sel.selectAllChildren(el);
        innerText = "" + sel;
        sel.removeAllRanges();
    }
    return innerText;
}

answered Sep 28, 2010 at 13:57

Tim DownTim Down

310k72 gold badges445 silver badges521 bronze badges

13

I tried to find some code I wrote for this a while back that I used. It worked nicely. Let me outline what it did, and hopefully you could duplicate its behavior.

  • Replace images with alt or title text.
  • Replace links with "text[link]"
  • Replace things that generally produce vertical white space. h2-h6, div, p, br, hr, etc. (I know, I know. These could actually be inline elements, but it works out well.)
  • Strip out the rest of the tags and replace with an empty string.

You could even expand this more to format things like ordered and unordered lists. It really just depends on how far you'll want to go.

EDIT

Found the code!

public static string Convert(string template)
{
    template = Regex.Replace(template, "

3

I made a function based on this answer: https://stackoverflow.com/a/42254787/3626940

function htmlToText(html){
    //remove code brakes and tabs
    html = html.replace(/\n/g, "");
    html = html.replace(/\t/g, "");

    //keep html brakes and tabs
    html = html.replace(/<\/td>/g, "\t");
    html = html.replace(/<\/table>/g, "\n");
    html = html.replace(/<\/tr>/g, "\n");
    html = html.replace(/<\/p>/g, "\n");
    html = html.replace(/<\/div>/g, "\n");
    html = html.replace(/<\/h>/g, "\n");
    html = html.replace(/
/g, "\n"); html = html.replace(//g, "\n"); //parse html into text var dom = (new DOMParser()).parseFromString('' + html, 'text/html'); return dom.body.textContent; }

answered Jun 12, 2018 at 17:16

chrmcpnchrmcpn

5546 silver badges10 bronze badges

1

Based on chrmcpn answer, I had to convert a basic HTML email template into a plain text version as part of a build script in node.js. I had to use JSDOM to make it work, but here's my code:

const htmlToText = (html) => {
    html = html.replace(/\n/g, "");
    html = html.replace(/\t/g, "");

    html = html.replace(/<\/p>/g, "\n\n");
    html = html.replace(/<\/h2>/g, "\n\n");
    html = html.replace(/
/g, "\n"); html = html.replace(//g, "\n"); const dom = new JSDOM(html); let text = dom.window.document.body.textContent; text = text.replace(/ /g, ""); text = text.replace(/\n /g, "\n"); text = text.trim(); return text; }

answered Mar 5, 2019 at 13:16

holm50holm50

7146 silver badges7 bronze badges

Three steps.

First get the html as a string.
Second, replace all 
and
with \r\n. Third, use the regular expression "<(.|\n)*?>" to replace all markup with "".

answered Sep 28, 2010 at 13:37

SerapthSerapth

7,0524 gold badges30 silver badges39 bronze badges

3

How to convert HTML to formatted plain text JavaScript?

The easiest way would be to strip all the HTML tags using the replace() method of JavaScript. It finds all tags enclosed in angle brackets and replaces them with a space. var text = html.

How do you convert HTML to plain text?

Convert HTML file to a text file (preserving HTML code and text)..
Click the File tab again, then click the Save as option..
In the Save as type drop-down list, select the Plain Text (*. txt) option. ... .
Click the Save button to save as a text document..

How to preserve line breaks in HTML?

The
 tag defines preformatted text. Text in a 
 element is displayed in a fixed-width font, and the text preserves both spaces and line breaks.

Can you convert HTML to JavaScript?

Insert your HTML text into the text box by typing it or cut and paste. Then to convert it to JavaScript that is usable in an HTML document, click the 'Convert HTML -> JavaScript' button; the converted code will appear in the same box. The 'Clear Text' button will erase everything in the text box.