[BNM] Regex help, remove p tags and all within

David Pashley david at davidpashley.com
Thu May 1 01:26:19 BST 2008


On Apr 30, 2008 at 23:31, delarge praised the llamas by saying:
> Yo yo BNM
> 
> I'm no regex expert - and I need to do this one thing, I have read up n some
> regex basics, but can't seem to find this scenario...
> 
> I need to replace or rather delete everything that's between <p> tags...
> 
> So this:
> -------------------------------------
> <a href="http://website.com/page/35466"><img src="
> http://website.com/images/amazingpic.jpg" /></a>
> <p>Once upon a time <a href="">I went</a> to see a quaint quail.</p>
> <p class="words"><a href="/words/yo">Yo</a> <a
> href="/words/sugar">sugar</a></p>
> -------------------------------------
> 
> Becomes just:
> -------------------------------------
> <a href="http://website.com/page/35466"><img src="
> http://website.com/images/amazingpic.jpg" /></a>
> -------------------------------------
> 
> This is for manuipulating an rss feed in Yahoo Pipes, so it's not a PHP
> scenario or similar... I have been able to delete the p tags themselves, but
> not what is contained within them, which changes with every item.
> 
> Any help would be greatly appreciated.
> 
A naive solution would be /<p>.*?<\/p>// however it's likely that you
will have to deal with newlines. Traditionally a regex will only go up
to the end of a line and you need to tell it to do a multiline regex. In
perl you'd do this using the s option (s/foo/bar/s) but I don't know if
yahoo pipes has that option. Given that Yahoo Pipes is dealing with rss,
don't they have the option of running xslt against feeds? That would be
a better way of doing it.

-- 
David Pashley
david at davidpashley.com
Nihil curo de ista tua stulta superstitione.


More information about the BNMlist mailing list. Powered by Wessex Networks