Skip to content Skip to sidebar Skip to footer

Remove Unnecessary Attributes From Html Tag Using Javascript Regex

I'm newbie to regular expressions, trying to filter the HTML tags keeping only required (src / href / style) attribute with their values and remove unnecessary attributes. While go

Solution 1:

@AhmadAhsan here is demo to fix your issue using DOM manipulation: https://jsfiddle.net/pu1hsdgn/

<scriptsrc="https://code.jquery.com/jquery-1.9.1.js"></script><script>var whitelist = ["src", "href", "style"];
        $( document ).ready(function() {
            functionfoo(contents) {
            var temp = document.createElement('div');
            var html = $.parseHTML(contents);
            temp = $(temp).html(contents);

            $(temp).find('*').each(function (j) {
                var attributes = this.attributes;
                var i = attributes.length;
                while( i-- ) {
                    var attr = attributes[i];
                    if( $.inArray(attr.name,whitelist) == -1 )
                        this.removeAttributeNode(attr);
                }
            });
            return $(temp).html();
        }
        var raw = '<title>Hello World</title><div style="margin:0px;" fadeout"="" class="xyz"><img src="abc.jpg" alt="" /><p style="margin-bottom:10px;">The event is celebrating its 50th anniversary K&ouml;&nbsp;<a href="http://www.germany.travel/" style="margin:0px;">exhibition grounds in Cologne</a>.</p><p style="padding:0px;"></p><p style="color:black;"><strong>A festival for art lovers</strong></p></div>'alert(foo(raw));
    });
    </script>

Solution 2:

Here you go, based on your original regex:

<([a-z][a-z0-9]*?)(?:[^>]*?((?:\s(?:src|href|style)=['\"][^'\"]*['\"]){0,3}))[^>]*?(\/?)>

Group 1 is the tag name, group 2 are the attributes, and group 3 is the / if there is one. I couldn't get it to work with non-allowed attributes interleaved with allowed attributes e.g. <a href="foo" class="bar" src="baz" />. I don't think it can be done.

Edit: Per @AhmadAhsan's corrections below the regex should be:

var html = `<div fadeout"="" style="margin:0px;" class="xyz">
                <img src="abc.jpg" alt="" />
                <p style="margin-bottom:10px;">
                    The event is celebrating its 50th anniversary K&ouml;&nbsp;
                    <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
                </p>
                <p style="padding:0px;"></p>
                <p style="color:black;">
                    <strong>A festival for art lovers</strong>
                </p>
            </div>`console.log( 
  html.replace(/<([a-z][a-z0-9]*)(?:[^>]*?((?:\s(?:src|href|style)=['\"][^'\"]*['\"]){0,3}))[^>]‌​*?(\/?)>/, '')
)
    

Post a Comment for "Remove Unnecessary Attributes From Html Tag Using Javascript Regex"