• 为了保证你在浏览本网站时有着更好的体验，建议使用类似Chrome、Firefox之类的浏览器~~
• 如果你喜欢本站的内容何不Ctrl+D收藏一下呢，与大家一起分享各种编程知识~
• 本网站研究机器学习、计算机视觉、模式识别~当然不局限于此，生命在于折腾，何不年轻时多折腾一下

# Spark mlib协同过滤算法中文翻译

2年前 (2017-05-27) 1530次浏览

## pyspark.mllib.recommendation module 中文翻译

class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model)[source]
A matrix factorisation model trained by regularized alternating least-squares.

>>> r1 = (1, 1, 1.0)
>>> r2 = (1, 2, 2.0)
>>> r3 = (2, 1, 2.0)
rating 评分数据，用户 商品 评分 建议还是不要像官方这么写，rdd 数据最好协程 Rating 类型，这样看起来直观
>>> ratings = sc.parallelize([r1, r2, r3])
trainImplicit 是一种隐式训练的方式，其得分并不是显示得分，区别于显示训练（相对而言），第二个参数是 A=U*V 中 U V 矩阵的秩 seed 就是随机种子，随机初始化训练矩阵
>>> model = ALS.trainImplicit(ratings, 1, seed=10)
>>> model.predict(2, 2)
0.4...

>>> testset = sc.parallelize([(1, 2), (1, 1)])
>>> model = ALS.train(ratings, 2, seed=0)
>>> model.predictAll(testset).collect()
[Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]

>>> model = ALS.train(ratings, 4, seed=10)

>>> model.userFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]

相似用户推荐
>>> model.recommendUsers(1, 2)
[Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]

>>> model.recommendProducts(1, 2)
[Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.rank
4

上面的矩阵分解中的 U 矩阵
>>> first_user = model.userFeatures().take(1)[0]
>>> latents = first_user[1]
>>> len(latents)
4

商品特征矩阵就是上面提到的 V 矩阵，与商品有关
>>> model.productFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]

>>> first_product = model.productFeatures().take(1)[0]
>>> latents = first_product[1]
>>> len(latents)
4

>>> products_for_users = model.recommendProductsForUsers(1).collect()
>>> len(products_for_users)
2
>>> products_for_users[0]
(1, (Rating(user=1, product=2, rating=...),))

>>> users_for_products = model.recommendUsersForProducts(1).collect()
>>> len(users_for_products)
2
>>> users_for_products[0]
(1, (Rating(user=2, product=1, rating=...),))

>>> model = ALS.train(ratings, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
3.73...

>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)])
>>> model = ALS.train(df, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
3.73...

>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
0.4...

>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)

>>> sameModel.predict(2, 2)
0.4...
>>> sameModel.predictAll(testset).collect()
[Rating(...
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass


New in version 0.9.0.

New in version 1.3.1.

predict(user, product)[source]

New in version 0.9.0.

predictAll(user_product)[source]
Returns a list of predicted ratings for input user and product pairs.

New in version 0.9.0.

productFeatures()[source]

Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

New in version 1.2.0.

rank[source]

Rank for the features in this model

New in version 1.4.0.

recommendProducts(user, num)[source]

Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendProductsForUsers(num)[source]

Recommends top “num” products for all users. The number returned may be less than this.
recommendUsers(product, num)[source]

Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendUsersForProducts(num)[source]

Recommends top “num” users for all products. The number returned may be less than this.
userFeatures()[source]

Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

New in version 1.2.0.

class pyspark.mllib.recommendation.ALS[source]
Alternating Least Squares matrix factorization

New in version 0.9.0.

classmethod train(ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None)[source]
Train a matrix factorization model given an RDD of ratings given by users to some products, in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, we run a given number of iterations of ALS. This is done using a level of parallelism given by blocks.

New in version 0.9.0.

classmethod trainImplicit(ratings, rank, iterations=5, lambda_=0.01, blocks=-1, alpha=0.01, nonnegative=False, seed=None)[source]
Train a matrix factorization model given an RDD of ‘implicit preferences’ given by users to some products, in the form of (userID, productID, preference) pairs. We approximate the ratings matrix as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, we run a given number of iterations of ALS. This is done using a level of parallelism given by blocks.

New in version 0.9.0.

class pyspark.mllib.recommendation.Rating[source]
Represents a (user, product, rating) tuple.

>>> r = Rating(1, 2, 5.0)
>>> (r.user, r.product, r.rating)
(1, 2, 5.0)
>>> (r[0], r[1], r[2])
(1, 2, 5.0)


New in version 1.2.0.

Deeplearn, 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权 , 转载请注明Spark mlib 协同过滤算法中文翻译

• 版权声明

本站的文章和资源来自互联网或者站长
的原创，按照 CC BY -NC -SA 3.0 CN
协议发布和共享，转载或引用本站文章
应遵循相同协议。如果有侵犯版权的资
源请尽快联系站长，我们会在24h内删
除有争议的资源。
• 网站驱动

• 友情链接

• 支持主题

邮箱：service@deeplearn.me