Use a very naive approach: define similarity as the number of shared tags between two media pieces. It can be implemented in SQL and produces decent results.