SQL Server 不允许对长度超过900字节的varchar字段或超过450字节的nvarchar字段作成索引。遇上对这类字段做查询时,就只能做全表查询,速度可想而知地慢。
网上看一篇文章介绍了一个好思路。就是为这个超长字段(F1)追加一个专门用来加速检索的hash字段F2,根据F1的值生成Hash值,保存到F2中,然后对F2做成索引。当需要查询F1时,只需要对F2做查询即可。图例如下:
1)Table做成时
++++ F1 ++++ +++ F2 ++++
++++ Str Hash(Str)
++++ (例4000byte) (16byte)
Insert into xxx (..,F1,F2,..) Values(...,@Str, Hash([@Str),...)
2)查询时
原select ... From ... WHERE F1 = @xx (无法使用INDEX)
现select ... From ... WHERE F2 = Hash(@xx) (使用INDEX)
关键的这个Hash函数如下:(参照http://sqlservernation.com/blogs/howtos/archive/2009/03/15/hashing-large-data-strings.aspx)
使用例 Hash (@xx)= fn_hashbytesMAX(@xx, 'MD5')
create function dbo.fn_hashbytesMAX
(@string nvarchar(max)
,@Algo varchar(10)
)
returns binary(16)
as
/************************************************************
*
* Author: Brandon Galderisi
* Last modified: 03-MAR-2009
* Purpose: uses the system function hashbytes as well
* as sys.fn_varbintohexstr to split an
* nvarchar(max) string and hash in 8000 byte
* chunks hashing each 8000 byte chunk,,
* getting the 32 byte output, streaming each
* 32 byte output into a string then hashing
* that string.
*
*************************************************************/
begin
declare @concat varchar(max)
,@concatX XML
,@NumHash int
,@HASH binary(16)
set @NumHash = ceiling((datalength(@string)/2)/(4000.0))
/* HashBytes only supports 8000 bytes so split the string if it is larger */
if @NumHash>1
begin
-- # * 4000 character strings
;with a as (select 1 as n union all select 1) -- 2
,b as (select 1 as n from a ,a a1) -- 4
,c as (select 1 as n from b ,b b1) -- 16
,d as (select 1 as n from c ,c c1) -- 256
,e as (select 1 as n from d ,d d1) -- 65,536
,f as (select 1 as n from e ,e e1) -- 4,294,967,296 = 17+ TRILLION characters
,factored as (select row_number() over (order by n) rn from f)
,factors as (select rn,(rn*4000)+1 factor from factored)
select @concat = cast((
select right(sys.fn_varbintohexstr
(
hashbytes(@Algo, substring(@string, factor - 4000, 4000))
)
, 32) + ''
from Factors
where rn for xml path('')
) as nvarchar(max))
set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo)
end
else
begin
set @HASH = convert(binary(16), hashbytes(@Algo, @string))
end
return @HASH
end
真是个巧妙的想法!