Using Machine Learning to Predict Amazon Search Rankings : Softpact eBusiness Solutions

Yet one more report, this one from Jumpshot, a knowledge intelligence agency, discovered that extra shopper product searches happen on Amazon than Google. Furthermore, ninety % of Amazon’s product views come from the corporate’s natural website search and never from promoting or exterior channels, based on Jumpshot.

Thus given the significance to retailers of optimizing for Amazon’s search engine, A9, it’s value understanding the rating elements.

It’s extensively reported that the objective of Amazon’s search engine is to rank merchandise based on their gross sales potential. Many elements might affect gross sales, akin to pricing, evaluations, and product web page copy. Presumably the merchandise that excel in these areas are rewarded with higher rankings.

…given the significance to retailers of optimizing for Amazon’s search engine, A9, it’s value understanding the rating elements.

It’s tough to determine the relative significance of these elements, particularly since Amazon doesn’t disclose them. So I tried to seek out out.

I’ll clarify my course of on this article.

Predicting Gross sales Potential

Whereas shopping Amazon’s “Greatest sellers” sections for numerous merchandise, I observed that in lots of key classes, corresponding to “Electronics” and “Automotive,” the highest sellers sometimes have probably the most evaluations, or almost probably the most.

Might the variety of product critiques be a proxy for the gross sales of a product and thus for rankings? Presumably, reviewers buy the product earlier than writing about their expertise.

To check, I used machine studying. Machine studying can do greater than generate predictions. Just a little-recognized use of machine studying is to create a mannequin after which study (in some instances) which options are crucial in making the prediction. I’ll use that strategy right here, with these steps.

Put together a machine studying supply file with Amazon bestseller info, together with evaluations.

Increase this supply file with evaluation sentiment evaluation utilizing Google’s Pure Language API.

Add this file to BigML, a simple-to-use machine studying software.

Generate a deep neural community mannequin (i.e., simulate the human mind to acknowledge patterns) to foretell the variety of critiques within the dataset.

Assessment the options that the majority affect the mannequin’s predictions. These are the elements which might be crucial when it comes to getting extra evaluations and, by proxy, gross sales.

Supply File

I discovered an inventory of greatest sellers from This fall 2017 at a JungleScout, an Amazon intelligence software. The listing consists of round 10,000 distinctive merchandise per class, throughout totally different classes. I targeted on “Automotive.”

JungleScout's site contained a list a Q4 2017 best sellers on Amazon.

JungleScout’s website contained an inventory a This fall 2017 greatest sellers on Amazon.

The dataset incorporates 15 columns, such because the Amazon Commonplace Identification Quantity (ASIN), product subcategory, and product identify. Right here is the complete listing of columns.

gl_product_group_desc
Subcategory
asin
upc1
item_name
merchant_brand_name
customer_average_review_rating
customer_review_count
has_fba_offer
has_retail_offer
total_offers
min_price
max_price
min_3p_price
max_3p_price

I additionally needed to extract the product evaluation textual content and use it to calculate the sentiment of the critiques in case they’re predictive. An assistant professor of pc science on the College of California at San Diego, Julian McAuley, has assembled Amazon evaluations textual content. I downloaded automotive critiques from his website for my check.

That dataset has 9 columns. Right here is the listing.

asin
useful
general
reviewText
reviewTime
reviewerID
reviewerName
abstract
unixReviewTime

I mixed each datasets, which offered many potential predictive elements, as follows.

reviewerID
asin
reviewerName
useful
reviewText
general
abstract
unixReviewTime
reviewTime
gl_product_group_desc
Subcategory
upc1
item_name
merchant_brand_name
customer_average_review_rating
customer_review_count
has_fba_offer
has_retail_offer
total_offers
min_price
max_price
min_3p_price
max_3p_price

Subsequent, I needed to seize the sentiment of the evaluations.

Sentiment of Evaluations

Google’s Pure Language Processing API will help. I processed the evaluate texts in that device and captured 4 further fields: Clearly Constructive, Clearly Destructive, Impartial, and Combined. Every of these fields contained a “doc rating,” “magnitude per doc,” and the “highest-scoring sentence.”

Google Natural Language Processing API can identify emotions and sentiments behind text — reviews on Amazon in this case.

Google Pure Language Processing API can determine feelings and sentiments behind textual content — evaluations on Amazon on this case.

To make certain, reviewers on Amazon additionally present a score (one to 5 stars) and I’ve that within the dataset. However I needed to see if a extra granular evaluation would offer further predictive elements.

Listed here are instance doc and sentence sentiments for product B00GG9FB8U.

'asin': 'B00GG9FB8U',
'best_sentence_magnitude': zero.eight,
'best_sentence_score': zero.eight,
'document_magnitude': 7.three,
'document_score': zero.1

After including the emotions to our dataset, I’m prepared for probably the most thrilling half: studying which elements are probably the most predictive.

Machine Studying with BigML

I uploaded our supply file to BigML, the aforementioned machine-studying device.

I chosen the customer_reviews_count because the predictive goal and a deep neural community as the kind of machine studying mannequin to construct as a result of it’s sometimes probably the most highly effective.

BigML searched 128 mixtures of fashions to seek out one of the best performing. Listed here are the leads to order — the highest predictors of gross sales.

Subcategory 86.seventy three%
Field1 (product quantity) 9.6%
Item_name three.forty nine%
Total_offers zero.06%
Upc1 zero.04%
Customer_average_review_rating zero.03%
Max_price zero.02%
Min_price zero.01%

I used to be stunned that the evaluation sentiment had no influence in any respect and that scores (“6. Customer_average_review_rating”) and worth (“7. Max-worth” and “eight. Min_price”) had little or no predictive influence.

A product's category on Amazon is the best predictor of sales, according to a machine-learning analysis using BigML.

A product’s class on Amazon is the perfect predictor of gross sales, in response to a machine-studying evaluation utilizing BigML.

However I can now see how the selection of product class and product identify might have a big influence as a result of some merchandise and classes are inherently fashionable with robust demand. Likewise, the variety of product provides predicted general gross sales, too.