What is content-based filtering?
Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.
To demonstrate content-based filtering, let’s hand-engineer some features for the Google Play store. The following figure shows a feature matrix where each row represents an app and each column represents a feature. Features could include categories (such as Education, Casual, Health), the publisher of the app, and many others. To simplify, assume this feature matrix is binary: a non-zero value means the app has that feature.
What are the components of a recommender?
There are 3 component procedures of a Recommender:
- Candidate Generations: This method is responsible for generating smaller subsets of candidates to recommend to a user, given a huge pool of thousands of items.
- Scoring Systems: Candidate Generations can be done by different Generators, so, we need to standardize everything and try to assign a score to each of the items in the subsets. This is done by the Scoring system.
- Re-Ranking Systems: After the scoring is done, along with it the system takes into account other additional constraints to produce the final rankings.
What is content-based and collaborative filtering?
Current recommendation systems such as content-based filtering and collaborative filtering use different information sources to make recommendations. Content-based filtering, makes recommendations based on user preferences for product features. Collaborative filtering mimics user-to-user recommendations. It predicts users preferences as a linear, weighted combination of other user preferences.
Both methods have limitations. Content-based filtering can recommend a new item, but needs more data of user preference in order to incorporate best match. Similar, collaborative filtering needs large dataset with active users who rated a product before in order to make accurate predictions. Combination of these different recommendation systems called hybrid systems,
How is content-based filtering implemented?
This method of content based filtering revolves completely around comparing user interests to product features. The products that have the most overlapping features with user interests are what’s recommended.
Given the significance of product features in this system, it is important to discuss how the user’s favorite features are decided.
Here, two methods can be used (possibly in combination). Firstly, users can be given a list of features out of which they can choose whatever they identify with the most. Secondly, the algorithm can keep track of the products the user has chosen before and add those features to the users’ data.
Similarly, product features can be identified by the developers of the product themselves. Moreover, users can be asked what features they believe identify with the products the most.
Once a numerical value, whether it is a binary 1 or 0 value or an arbitrary number, has been assigned to product features and user interests, a method to identify similarities between products and user interests needs to be identified. A very basic formula would be the dot product.
Why is content-based better than collaborative filtering?
Content-based filtering does not require other users' data during recommendations to one user.
What are the main methods of content-based recommendation?
The content-based recommendation system works on two methods, both of them using different models and algorithms. One uses the vector spacing method and is called method 1, while the other uses a classification model and is called method 2.
1. The vector space method
Let us suppose you read a crime thriller book by Agatha Christie, you review it on the internet. Also, you review one more fictional book of the comedy genre with it and review the crime thriller books as good and the comedy one as bad.
Now, a rating system is made according to the information provided by you. In the rating system from 0 to 9, crime thriller and detective genres are ranked as 9, and other serious books lie from 9 to 0 and the comedy ones lie at the lowest, maybe in minus.
With this information, the next book recommendation you will get will be of crime thriller genres most probably as they are the highest rated genres for you.
For this ranking system, a user vector is created which ranks the information provided by you. After this, an item vector is created where books are ranked according to their genres on it.
With the vector, every book name is assigned a certain value by multiplying and getting the dot product of the user and item vector, and the value is then used for recommendation.
Like this, the dot products of all the available books searched by you are ranked and according to it the top 5 or top 10 books are assigned.
This method of content based filtering was the first one used by a content-based recommendation system to recommend items to the user.
2. Classification method
The second method of content based filtering is the classification method. In it, we can create a decision tree and find out if the user wants to read a book or not.
For example, a book is considered, let it be The Alchemist.
Based on the user data, we first look at the author name and it is not Agatha Christie. Then, the genre is not a crime thriller, nor is it the type of book you ever reviewed. With these classifications, we conclude that this book shouldn’t be recommended to you.
What are the advantages and disadvantages of content-based recommendation system?
Advantages of content-based recommender system are following:
- Because the recommendations are tailored to a person, the model does not require any information about other users. This makes scaling of a big number of people more simple.
- The model can recognize a user's individual preferences and make recommendations for niche things that only a few other users are interested in.
- New items may be suggested before being rated by a large number of users, as opposed to collective filtering.
The disadvantage are as follows:
- This methodology necessitates a great deal of domain knowledge because the feature representation of the items is hand-engineered to some extent. As a result, the model can only be as good as the characteristics that were hand-engineered.
- The model can only give suggestions based on the user's current interests. To put it another way, the model's potential to build on the users' existing interests is limited.
- Since it must align the features of a user's profile with available products, content-based filtering offers only a small amount of novelty.
- Only item profiles are generated in the case of item-based filtering, and users are recommended items that are close to what they rate or search for, rather than their previous background. A perfect content-based filtering system can reveal nothing surprising or unexpected.