• 为了保证你在浏览本网站时有着更好的体验,建议使用类似Chrome、Firefox之类的浏览器~~
    • 如果你喜欢本站的内容何不Ctrl+D收藏一下呢,与大家一起分享各种编程知识~
    • 本网站研究机器学习、计算机视觉、模式识别~当然不局限于此,生命在于折腾,何不年轻时多折腾一下

HIVE正则表达式

Sql admin 3年前 (2017-05-24) 4790次浏览 0个评论 扫描二维码

随着 HIVE 的持续使用,现在业务上也需要匹配文本,一般的加减乘除之类以及统计聚合运算都满足不了了,如果是短的字符串可以使用字符串相关的处理,但是如果是大量的文本都需要正则表达式从中提取出想要的内容

,下面的例子都是 copy 过来的,作为相关函数的说明,会使用就好,来自 http://blog.csdn.net/bitcarmanlee/article/details/51106726

需要注意的是转义字符,一般情况下比如匹配数字一般写代码\d 就行了,在 hive 中需要双斜杠\\d

相关正则表达式的内容可以参考正则表达式

1.regexp

语法: A REGEXP B
操作类型: strings
描述: 功能与 RLIKE 相同

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> regexp <span class="hljs-string">'\\d{8}'</span></span></code>

与下面查询的效果是等效的:

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> rlike <span class="hljs-string">'\\d{8}'</span>;</span></code>

2.regexp_extract

语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明:将字符串 subject 按照 pattern 正则表达式的规则拆分,返回 index 指定的字符。

<code class="hljs r has-numbering">hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
Total MapReduce CPU Time Spent: <span class="hljs-number">7</span> seconds <span class="hljs-number">340</span> msec
OK
love
Time taken: <span class="hljs-number">28.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>
<code class="hljs r has-numbering">hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">2</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
You
Time taken: <span class="hljs-number">26.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>
<code class="hljs r has-numbering">hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
I
Time taken: <span class="hljs-number">26.057</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>
<code class="hljs r has-numbering">hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">0</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
IloveYou
Time taken: <span class="hljs-number">28.06</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>
<code class="hljs r has-numbering">hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilove
Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>

 

3.regexp_replace

语法: regexp_replace(string A, string B, string C)
返回值: string
说明:将字符串 A 中的符合Java正则表达式 B 的部分替换为 C。注意,在有些情况下要使用转义字符,类似Oracle中的 regexp_replace 函数。

<code class="hljs r has-numbering">hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilove
Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)</code>

 

<code class="hljs r has-numbering">hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">"lili"</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilovelili</code>

Deeplearn, 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权 , 转载请注明HIVE 正则表达式
喜欢 (0)
admin
关于作者:
互联网行业码农一枚/业余铲屎官/数码影音爱好者/二次元

您必须 登录 才能发表评论!