随着HIVE的持续使用,现在业务上也需要匹配文本,一般的加减乘除之类以及统计聚合运算都满足不了了,如果是短的字符串可以使用字符串相关的处理,但是如果是大量的文本都需要正则表达式从中提取出想要的内容
,下面的例子都是copy过来的,作为相关函数的说明,会使用就好,来自http://blog.csdn.net/bitcarmanlee/article/details/51106726
需要注意的是转义字符,一般情况下比如匹配数字一般写代码\d就行了,在hive中需要双斜杠\\d
相关正则表达式的内容可以参考正则表达式
1.regexp
语法: A REGEXP B 操作类型: strings 描述: 功能与RLIKE相同
<span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> regexp <span class="hljs-string">'\\d{8}'</span></span>
与下面查询的效果是等效的:
<span class="hljs-operator"><span class="hljs-keyword">select</span> <span class="hljs-aggregate">count</span>(*) <span class="hljs-keyword">from</span> olap_b_dw_hotelorder_f <span class="hljs-keyword">where</span> create_date_wid <span class="hljs-keyword">not</span> rlike <span class="hljs-string">'\\d{8}'</span>;</span>
2.regexp_extract
语法: regexp_extract(string subject, string pattern, int index) 返回值: string 说明:将字符串subject按照pattern正则表达式的规则拆分,返回index指定的字符。
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> Total MapReduce CPU Time Spent: <span class="hljs-number">7</span> seconds <span class="hljs-number">340</span> msec OK love Time taken: <span class="hljs-number">28.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'I(.*?)(You)'</span>,<span class="hljs-number">2</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK You Time taken: <span class="hljs-number">26.067</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">1</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK I Time taken: <span class="hljs-number">26.057</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_extract(<span class="hljs-string">'IloveYou'</span>,<span class="hljs-string">'(I)(.*?)(You)'</span>,<span class="hljs-number">0</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK IloveYou Time taken: <span class="hljs-number">28.06</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK Ilove Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
3.regexp_replace
语法: regexp_replace(string A, string B, string C) 返回值: string 说明:将字符串A中的符合Java正则表达式B的部分替换为C。注意,在有些情况下要使用转义字符,类似Oracle中的regexp_replace函数。
hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">""</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK Ilove Time taken: <span class="hljs-number">26.063</span> seconds, Fetched: <span class="hljs-number">1</span> row(s)
hive> select regexp_replace(<span class="hljs-string">"IloveYou"</span>,<span class="hljs-string">"You"</span>,<span class="hljs-string">"lili"</span>) from test1 limit <span class="hljs-number">1</span>; Total jobs = <span class="hljs-number">1</span> <span class="hljs-keyword">...</span> OK Ilovelili