What are the factors that affect the recommendation system? The article summarizes 5 elements, let’s take a look.
In a website or app, the recommendation system usually interacts with multiple aspects of the entire large-scale system. The recommendation system itself also has many components. In addition to the overall environment in which the entire system is located, there are many factors that affect it. The final effect of a recommendation system is good or bad. The effect here refers to the overall effect including accuracy, recall, diversity, etc., and no specific distinction is made. Here we try to discuss some of the main factors. It should be pointed out that not all of these factors can be controlled by us, but it is still very useful for us to develop and optimize the system to understand what they are.
1. User factors
Unlike the advertising system that needs to face both users and advertisers at the same time, the recommendation system only serves one object, that is, the user, so the user’s factor will greatly affect the effect of the system. Specifically, the ratio of new users to old users in the system can be said to be one of the factors that have the greatest impact on the effect. Everyone knows that the recommendation system is highly dependent on user behavior, and for new users who have no or very few behaviors, the effect will certainly not be too good. Therefore, the higher the proportion of new users in the entire system, the overall performance of the system will be The worse.
This is a factor that cannot be controlled by a typical recommendation system itself, but requires the joint efforts of the entire system to solve it. There are two ways to solve this problem: one is to try to optimize the cold start algorithm of the recommendation system, this method will definitely be effective, but its ceiling is also very low; and the other is to try to integrate new ones on the platform. Transforming users into old users means trying to get them to interact more with the platform and generate behaviors, so as to get out of the cold start stage. Compared with these two methods, the second method may have better results. This is mainly because the optimization space of the cold start algorithm is really limited, and after turning it into a “hot” user, various optimization strategies can be deployed. Useful. This is also an idea that can be used for reference in multiple scenarios: turning unknown problems into known problems instead of creating new ones.
2. Product design factors
The so-called product design factors refer to where and in what form the recommended items are displayed to users. If the recommendation algorithm is a person’s internal, then product design is a person’s face. In this age of looking at faces, whether you look good or not will greatly affect the release of algorithm energy. The most common external factors that affect the effect include but are not limited to:
The quality of the picture. The Internet has long entered the era of image reading. No matter the recommendation of any item, such as commodities, information, etc., the attractiveness of pictures must be greater than that of no pictures. In the case of all pictures, the size and definition of the picture will have a great influence on whether the user is interested. In addition to the basic quality of size and clarity, the quality of the information conveyed by the picture itself is also critical. For example, for a picture of a product, if the main information of the product and the content that users care about cannot be displayed in the picture, then the user The probability of clicking will be greatly reduced. After all, everyone is very busy, and there is a price to click. Therefore, it is particularly important to guide users to take high-quality product pictures for scenarios such as the C2C market where users take photos themselves. On this issue, the story of Airbnb taking photos for the host at its own expense is enough to prove its importance and significance. .
The attractiveness of the topic. In addition to pictures, topics described in text are also very important. After all, text is still a major way for people to obtain information. On Zhuanzhuan platform, there will be some lazy users who only write information like “picture, private chat” in the text description. It is conceivable that the competitiveness of such descriptions is relatively weak, and it will also make people feel that The seller is not very concerned about this product, so unless your product is very competitive in other aspects, it is difficult to get conversion.
It’s important that the theme is attractive, but if it is too “top-heavy” and only focuses on the quality of the theme and ignores the quality of the item itself, it will be counterproductive and arouse user disgust. The most typical example of this is the various headline party articles that are now flooding the screen. In order to attract users’ clicks, they made a big fuss on the headline, but after the user clicked in, they found that either the article was of low quality or the text was incorrect. It will have a great negative impact on the credibility of the platform and is a way of killing chickens and eggs.
Therefore, in terms of text content description, we should try to be as comprehensive as possible, but we should not deviate from the facts. The long-term development of the platform is sacrificed for the sake of temporary click-through rate.
Whether the key information is exposed. The so-called key information refers to information that can influence or influence users to generate clicks and conversions. In addition to the pictures and text descriptions mentioned above, there are some key information with characteristics in each business scenario, such as sales volume, number of comments, etc. . Part of the reason is that this information itself will have an impact on the conversion of users. On the other hand, the recommendation algorithm may use this information when recalling or sorting, and then displaying this information serves as a function of recommendation explanation to a certain extent.
Whether there is interference information. This refers to whether there are other content around the module that affect the user’s attention, and whether the user can concentrate on browsing the recommended module. Typically, such as some bright advertisements or promotion/event banners, placed next to the recommended position will have varying degrees of impact on the user’s attention, and thus affect the conversion. If the recommendation system is an important part of your business, then you should give it enough dedicated space and location, and try not to mix it with other content. In this complex world, many times, less is more.
3. Data factors
The recommendation system is a typical algorithm-driven system, and if the algorithm is the skeleton of the system, then the data is the blood of the system. If the data quality and quantity are not enough, then the effect of any algorithm will be compromised. It is easy to understand that the amount of data is insufficient, and whether the amount of data is sufficient is often related to the development of the entire website or APP, which is beyond our control, but the quality of the data is different, and it can be continuously strengthened through human efforts. . So here is a brief talk about possible common problems in data quality.
Key information is missing . The lack of information is one of the biggest problems in data quality, especially the key information that affects the algorithm strategy or ranking model. For example, there is no specific exposure location information in the exposure data, and there is no information about the length of time the user stays in the display log. Such information will indeed directly lead to the decrease of the algorithm effect, which in turn affects the final effect. The emergence of these problems is often due to the fact that there is no algorithm-related personnel involved in the initial data system construction, which leads to the failure to design these related information. However, this kind of problem is relatively easy to solve, as long as it is found to be missing as soon as possible.
The data is poorly designed and complicated to use. There is another situation, that is, the key information is all there, and there is no serious missing, but the data structure or table structure design is not reasonable enough, which leads to the acquisition of one information to join multiple tables or go through complex calculation logic. In this case, although the key information can be obtained, due to the high cost of acquisition, it is likely to be compromised to varying degrees in the project implementation, resulting in compromised data quality and affecting the final effect. The solution to this type of problem, from the big point of view, is to do a good job in the construction of algorithm-related data warehouse/data mart, so that data acquisition, change and maintenance are as simple as possible, reducing the cost of data construction, and improving data The efficiency of use.
The data used by the recommendation system is usually a subset of the data system of the entire website. Therefore, the control of the quality of this part of the data requires the joint efforts of the developers of the recommendation system and the developers of the data system to ensure the availability and ease of data. Usability.
4. Algorithm strategy factors
Having said so much, I finally talked about the core algorithm strategy part of the recommendation system. The influence of algorithm strategy on the effect is unquestionable, but its influence is also multifaceted. Specifically, the algorithm may affect the effect from the following aspects.
Algorithm complexity affects accuracy. Algorithms with higher complexity are generally more accurate. No matter what specific algorithm is used, the overall trend is correct. For example, a simple ranking model may not be able to do with a nonlinear model, continuous-valued features may not be able to do with discretized nonlinear features, vanilla rnn may not be able to do with LSTM and so on for timing issues. Under the premise of ensuring data quality, using a highly complex model is a way to improve the effect of ensuring revenue. Of course, the premise is that the algorithm must be in line with the business and cannot be complicated for the sake of complexity.
The stability of the algorithm affects the stability of the effect. We know that there is a type of machine learning model that has low bias but also high variance. The high variance here refers to the trained model. The performance difference on different data sets will be relatively large. This phenomenon is also Another name is overfitting. If the amount of data is large enough, coupled with reasonable regularization methods, over-fitting is relatively easy to avoid. Therefore, the problem is more likely to occur when the amount of data is insufficient. In this case, a simple model such as a linear model should be selected to ensure the stability of the results, and even rule-based algorithms can be considered to ensure stability.
Why pay attention to the stability of the results? The reason here is similar to the average complexity of the algorithm when we study the design and analysis of the algorithm. Although we hope to get a very accurate model, we also hope that this model will be stable and predictable when it runs online. It will not have a good effect today and a poor effect tomorrow. In actual use, regardless of the accuracy rate, it is hoped that the stability is guaranteed.
5. Engineering framework factors
Finally, let’s briefly talk about the factors of engineering architecture. No matter what kind of data and what kind of algorithm, a specific project implementation is required before it is finally presented to users. Then the specific project architecture selected during the implementation process will also have an impact on the effect.
The effect is time-consuming. The response speed of the interface is undoubtedly the most direct embodiment of the engineering architecture to the user. The slow response speed will definitely cause the user’s impatience and may be lost directly. To increase the speed, there are usually several types of methods. One is to optimize the algorithm to reduce unnecessary calculations; the second is to choose a simple algorithm; the third is to use the idea of caching, and only perform as few calculations as possible in the online part, and the rest All calculations are done at the offline or near-line layer, reducing the burden of real-time calculations.
The impact of architecture design on troubleshooting and monitoring. Old drivers know that a car that can only run but cannot be repaired is definitely not allowed on the road. In the same way, the overall architecture design of the recommendation system should be more friendly to the troubleshooting, and it can quickly locate in the system when problems occur or need to verify guesses, instead of adding debug information now online, causing the troubleshooting process to be delayed. long. A good engineer will leave a way for himself when designing the system, instead of temporizing himself when something goes wrong.
The impact of architecture design on iteration speed. In addition to the superficial impact of response speed, whether the design of the entire architecture can support rapid strategy iterations has an invisible impact on the effect. If the overall architecture is bloated, the module separation is not clear, and the basic logic lacks proper abstraction and identity, it will cause the data and strategy iterations to be unable to proceed quickly, each iteration requires a very complicated process, and the correctness cannot be guaranteed. Such problems will slow down the development speed of the system and ultimately affect the effect.
In addition to the factors mentioned above, there are many details that will affect the final effect of the recommendation system. Therefore, when we improve the effect of the recommendation system, our eyes should not only be fixed at one place, but we must have a certain overall view and be able to view the overall situation. The angle finds what currently has the greatest impact on the effect, and then conducts targeted optimization. For those factors that cannot be changed for the time being, we must be aware of the factors and intervene at the right time.