目前HIVE当中并没有相应的api用于自动生成相应的索引,在一般的sql中包含了identity函数可以用于自动生成相应的索引,为了解决这个问题,在apache官网上提供了相应的解决方法,这个功能还是比较实用的,比如博主需要处理的字段数据中一部分是全部数字,但是有些数据是字符和数字的组合,需要对这些数据做相同的函数处理有的时候是无法做到的,因此可以将其映射到一个额外的字段,到时处理的结果返回再次映射回到最初的数据。
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.hadoop.hive.contrib.udf; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.udf.UDFType; import org.apache.hadoop.io.LongWritable; /** * UDFRowSequence. */ @Description(name = "row_sequence", value = "_FUNC_() - Returns a generated row sequence number starting from 1") @UDFType(deterministic = false, stateful = true) public class UDFRowSequence extends UDF { private LongWritable result = new LongWritable(); public UDFRowSequence() { result.set(0); } public LongWritable evaluate() { result.set(result.get() + 1); return result; } } // End UDFRowSequence.java
可以将其打包成jar包,然后再hive查询中导入作为UDF,这样就可以实现字段的自动索引