Programming Languages
C#
Subjective
Mar 15, 2013
Assuming $_ contains HTML, which of the following substitutions will remove all tags in it?
1.s/<.*>//g;
2.s/<.*?>//gs;
3.s/<\/?[A-Z]\w*(?:\s+[A-Z]\w*(?:\s*=\s*(?:(["']).*?\1|[\w-.]+))?)*\s*>//gsix;
Detailed Explanation
If it weren't for HTML comments, improperly formatted HTML, and tags with interesting data like < SCRIPT >, you could do this. Alas, you cannot. It takes a lot more smarts, and quite frankly, a real parser.
Discussion (0)
No comments yet. Be the first to share your thoughts!
Share Your Thoughts