• 为了保证你在浏览本网站时有着更好的体验,建议使用类似Chrome、Firefox之类的浏览器~~
    • 如果你喜欢本站的内容何不Ctrl+D收藏一下呢,与大家一起分享各种编程知识~
    • 本网站研究机器学习、计算机视觉、模式识别~当然不局限于此,生命在于折腾,何不年轻时多折腾一下

Spark mlib协同过滤算法中文翻译

bigdata admin 2年前 (2017-05-27) 1530次浏览 0个评论 扫描二维码

最近看了下 spark 协同过滤的 api,并根据提供的代码写了一版商品推荐代码,现在将当前的模块一些 api 函数翻译一下,万一有人需要呢,这个也是说不准,也加强自己对其的理解吧,大数据之路走起来

pyspark.mllib.recommendation module 中文翻译

class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model)[source]
A matrix factorisation model trained by regularized alternating least-squares.

>>> r1 = (1, 1, 1.0)
>>> r2 = (1, 2, 2.0)
>>> r3 = (2, 1, 2.0)
rating 评分数据,用户 商品 评分 建议还是不要像官方这么写,rdd 数据最好协程 Rating 类型,这样看起来直观
>>> ratings = sc.parallelize([r1, r2, r3])
trainImplicit 是一种隐式训练的方式,其得分并不是显示得分,区别于显示训练(相对而言),第二个参数是 A=U*V 中 U V 矩阵的秩 seed 就是随机种子,随机初始化训练矩阵
>>> model = ALS.trainImplicit(ratings, 1, seed=10)
>>> model.predict(2, 2)
0.4...
>>> testset = sc.parallelize([(1, 2), (1, 1)])
>>> model = ALS.train(ratings, 2, seed=0)
>>> model.predictAll(testset).collect()
[Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]
>>> model = ALS.train(ratings, 4, seed=10)
就是上面提到的矩阵分解中的 U 矩阵结果
>>> model.userFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
相似用户推荐
>>> model.recommendUsers(1, 2)
[Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
相似商品推荐
>>> model.recommendProducts(1, 2)
[Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.rank
4
上面的矩阵分解中的 U 矩阵
>>> first_user = model.userFeatures().take(1)[0]
>>> latents = first_user[1]
>>> len(latents)
4
商品特征矩阵就是上面提到的 V 矩阵,与商品有关
>>> model.productFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
>>> first_product = model.productFeatures().take(1)[0]
>>> latents = first_product[1]
>>> len(latents)
4
>>> products_for_users = model.recommendProductsForUsers(1).collect()
>>> len(products_for_users)
2
>>> products_for_users[0]
(1, (Rating(user=1, product=2, rating=...),))
>>> users_for_products = model.recommendUsersForProducts(1).collect()
>>> len(users_for_products)
2
>>> users_for_products[0]
(1, (Rating(user=2, product=1, rating=...),))
>>> model = ALS.train(ratings, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
3.73...
>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)])
>>> model = ALS.train(df, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
3.73...
>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=10)
>>> model.predict(2, 2)
0.4...
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)
保存 model 并且重新加载
>>> sameModel = MatrixFactorizationModel.load(sc, path)
>>> sameModel.predict(2, 2)
0.4...
>>> sameModel.predictAll(testset).collect()
[Rating(...
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass

New in version 0.9.0.

classmethod load(sc, path)[source]
从指定 path 路径加载模型  Load a model from the given path

New in version 1.3.1.

predict(user, product)[source]
预测评分 Predicts rating for the given user and product.

New in version 0.9.0.

predictAll(user_product)[source]
Returns a list of predicted ratings for input user and product pairs.

New in version 0.9.0.

productFeatures()[source]
矩阵分解中的 v 矩阵信息,返回键值对 RDD,第一个元素是商品的名字,第二个是商品的特征向量

Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

New in version 1.2.0.

rank[source]
分解矩阵的秩,是训练模型时的参数
Rank for the features in this model

New in version 1.4.0.

recommendProducts(user, num)[source]
返回推荐的商品按照评分的降序排列
Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendProductsForUsers(num)[source]
为用户推荐 num 个商品
Recommends top “num” products for all users. The number returned may be less than this.
recommendUsers(product, num)[source]
根据商品推荐 num 个用户,并且返回相关的商品,按照评分的降序排列
Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

New in version 1.4.0.

recommendUsersForProducts(num)[source]
返回前 num 个用户
Recommends top “num” users for all products. The number returned may be less than this.
userFeatures()[source]
返回矩阵分解中的 U 矩阵
Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

New in version 1.2.0.

class pyspark.mllib.recommendation.ALS[source]
Alternating Least Squares matrix factorization

New in version 0.9.0.

classmethod train(ratings, rank, iterations=5, lambda_=0.01, blocks=-1, nonnegative=False, seed=None)[source]
Train a matrix factorization model given an RDD of ratings given by users to some products, in the form of (userID, productID, rating) pairs. We approximate the ratings matrix as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, we run a given number of iterations of ALS. This is done using a level of parallelism given by blocks.

New in version 0.9.0.

classmethod trainImplicit(ratings, rank, iterations=5, lambda_=0.01, blocks=-1, alpha=0.01, nonnegative=False, seed=None)[source]
Train a matrix factorization model given an RDD of ‘implicit preferences’ given by users to some products, in the form of (userID, productID, preference) pairs. We approximate the ratings matrix as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, we run a given number of iterations of ALS. This is done using a level of parallelism given by blocks.

New in version 0.9.0.

class pyspark.mllib.recommendation.Rating[source]
Represents a (user, product, rating) tuple.

>>> r = Rating(1, 2, 5.0)
>>> (r.user, r.product, r.rating)
(1, 2, 5.0)
>>> (r[0], r[1], r[2])
(1, 2, 5.0)

New in version 1.2.0.


Deeplearn, 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权 , 转载请注明Spark mlib 协同过滤算法中文翻译
喜欢 (0)
admin
关于作者:
互联网行业码农一枚/业余铲屎官/数码影音爱好者/二次元

您必须 登录 才能发表评论!