Get all Full Text Search word breakers

I couldn’t find this anywhere online so I figured it out myself. You can run the below script to get all of the defined word breakers for a given language. That is, the characters that will be considered a word separator by the full text search engine.

To figure it out, I loop through the first thousand unicode characters, create a string by appending a character in front and behind, and then pass it to the parser engine with sys.dm_fts_parser. If two results are returned, the character broke the word, if one, it didn’t.

You could change the language by updating the 1033 parameter to whatever you need it to be.

--unicode goes from 0 to 65535
--most used values < 1000
--can get the unicode value of a char with function UNICODE('x')
-- get the char from an int with NCHAR(x)

DECLARE @results TABLE(Breaker nchar, IntValue int)

DECLARE @i int;
SET @i = 0;

WHILE @i < 1000
BEGIN
	DECLARE @sql NVARCHAR(600);
	SET @sql = 'a' + NCHAR(@i) + 'b';
	DECLARE @ret int;
	
	-- 1033 = english, first zero = system stoplist, second zero = no accent sensitivity
	Select @ret =  count(*) from sys.dm_fts_parser(@sql,1033,0,0) 
	if @ret > 1
	BEGIN
		INSERT INTO @results (Breaker, INtValue) values (NCHAR(@i), @i);
	END
	SET @i = @i + 1
END

SELECT distinct Breaker from @results;

Algorithmatica

Exploring the science and art of problem solving

Get all Full Text Search word breakers

Leave a Reply Cancel reply