博主使用的是pyhive,目前发现github作者还在一直维护当中附上链接pyhive
作为工具,我们直接使用就好了,当然你的服务器也要开启对用的服务,后面我们才可以通过客户端连接使用
在macos下面可以通过安装下面的包即可
pip install pyhive pip install thift pip install sasl pip install thrift-sasl
下面贴出一个简单的小例子
# -*- coding: utf-8 -*- # @Time : 2018/6/19 上午11:32 # @Author : zhusimaji # @File : python_hive.py # @Software: PyCharm from pyhive import hive PORT=10000 name = "*****" password = "*****" conn=hive.Connection(host="your host", port=PORT, username=name,database='stg',auth='LDAP',password=password) cursor = conn.cursor() cursor.execute("SELECT col1 FROM table123 LIMIT 10") for result in cursor.fetchall(): print(result)
一般情况下默认的端口都是10000,下面看下Connection类的初始化参数
def __init__(self, host=None, port=None, username=None, database='default', auth=None, configuration=None, kerberos_service_name=None, password=None, thrift_transport=None): :param host: host参数 :param port:hive服务端口 Defaults to 10000. :param auth: The value of hive.server2.authentication used by HiveServer2. 认证参数 Defaults to ``NONE``. :param configuration: A dictionary of Hive settings (functionally same as the `set` command) hive的参数 :param kerberos_service_name: Use with auth='KERBEROS' only :param password: Use with auth='LDAP' or auth='CUSTOM' only 如果前面的auth参数是ldap或者costom则需要输入密码 :param thrift_transport: A ``TTransportBase`` for custom advanced usage. Incompatible with host, port, auth, kerberos_service_name, and password.
所以你在上面的测试代码中看到我们使用了LDAP认证方式,需要输入对应的账号密码 Connection有几个常见的方法简单说明一下
#close顾名思义就是关闭连接呗 def close(self): """Close the underlying session and Thrift transport""" req = ttypes.TCloseSessionReq(sessionHandle=self._sessionHandle) response = self._client.CloseSession(req) self._transport.close() _check_status(response) #commit一般只有类似mysql提供服务支持,hive还是算了不支持 def commit(self): """Hive does not support transactions, so this does nothing.""" pass #cursor 游标,后续用于提交sql语句查询 def cursor(self, *args, **kwargs): """Return a new :py:class:`Cursor` object using the connection.""" return Cursor(self, *args, **kwargs) @property def client(self): return self._client @property def sessionHandle(self): return self._sessionHandle #mysql支持事务,所有可以rollback,当然hive不支持 def rollback(self): raise NotSupportedError("Hive does not have transactions") # pragma: no cover
下面再来看看Cursor
class Cursor(common.DBAPICursor): """These objects represent a database cursor, which is used to manage the context of a fetch operation. Cursors are not isolated, i.e., any changes done to the database by a cursor are immediately visible by other cursors or connections. """ def __init__(self, connection, arraysize=1000): self._operationHandle = None super(Cursor, self).__init__() self._arraysize = arraysize self._connection = connection
这个类是从DBAPICursor继承过来的,在DBAPICursor已经定义了很多方法,在前面的样例代码中我们使用了cursor.fetchall(),其中fetchall就是在父类中定义的 大概描述就是这么多。。。。。