HIVE正则表达式

6,730次阅读
没有评论

随着HIVE的持续使用,现在业务上也需要匹配文本,一般的加减乘除之类以及统计聚合运算都满足不了了,如果是短的字符串可以使用字符串相关的处理,但是如果是大量的文本都需要正则表达式从中提取出想要的内容

,下面的例子都是copy过来的,作为相关函数的说明,会使用就好,来自http://blog.csdn.net/bitcarmanlee/article/details/51106726

需要注意的是转义字符,一般情况下比如匹配数字一般写代码\d就行了,在hive中需要双斜杠\\d

相关正则表达式的内容可以参考正则表达式

1.regexp

语法: A REGEXP B 操作类型: strings 描述: 功能与RLIKE相同

<span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> regexp <span class="hljs-string">'\\d{8}'</span></span>

与下面查询的效果是等效的:

<span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> rlike <span class="hljs-string">'\\d{8}'</span>;</span>

2.regexp_extract

语法: regexp_extract(string subject, string pattern, int index) 返回值: string 说明:将字符串subject按照pattern正则表达式的规则拆分,返回index指定的字符。

hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
Total MapReduce CPU Time Spent: <span class="hljs-number">7</span> seconds <span class="hljs-number">340</span> msec
OK
love
Time taken: <span class="hljs-number">28.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">2</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
You
Time taken: <span class="hljs-number">26.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
I
Time taken: <span class="hljs-number">26.057</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">0</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
IloveYou
Time taken: <span class="hljs-number">28.06</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilove
Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)

 

3.regexp_replace

语法: regexp_replace(string A, string B, string C) 返回值: string 说明:将字符串A中的符合Java正则表达式B的部分替换为C。注意,在有些情况下要使用转义字符,类似Oracle中的regexp_replace函数。

hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilove
Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)

 

hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">"lili"</span>) from test1 limit <span class="hljs-number">1</span>;
Total jobs = <span class="hljs-number">1</span>
<span class="hljs-keyword">...</span>
OK
Ilovelili
admin
版权声明:本站原创文章,由admin2017-05-24发表,共计1631字。
转载提示:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)