There are several ways to fuzzy search in Postgres. There's the ILIKE
operator for case-insensitive searching, or the pg_trgm
extension for trigram matching for more precise similarity comparisons. However ILIKE
is slow because it scans the entire table, and pg_trgm
isn't very accurate, and neither support multiple words or phrases.
The best way to do a fuzzy search in PostgreSQL is to use the full-text search utilities. They're fast, built-in, and don't require any extra extensions. It's also much more powerful than the other options and can fuzzy search with stemming and other normalization included.
Full-text search requires tsvector
columns which are generated from the text you want to search, using the to_tsvector
function which takes a string and returns a tsvector
object. Queries use the @@
operator to compare that column with a ts_query
object generated by the to_tsquery
function.
Here's an example:
-- create a table with a tsvector column
CREATE TABLE movies (title text, tsv tsvector);
-- insert sample data
INSERT INTO movies (title) VALUES
('12 Angry Men'),
('American Beauty'),
('American History X'),
('Back to the Future'),
('Casablanca'),
('City of God'),
('Fight Club'),
('Forrest Gump'),
('Gladiator'),
('Good Will Hunting'),
('Goodfellas'),
('Inception'),
('Interstellar'),
('Jurassic Park'),
('Life Is Beautiful'),
('Once Upon a Time in America'),
('Pulp Fiction'),
('Saving Private Ryan'),
('Spirited Away'),
('Star Wars: A New Hope'),
('Terminator 2: Judgment Day'),
('The Dark Knight'),
('The Godfather'),
('The Good, the Bad and the Ugly'),
('The Green Mile'),
('The Incredibles'),
('The Lion King'),
('The Lord of the Rings: The Return of the King'),
('The Matrix'),
('The Shawshank Redemption'),
('The Usual Suspects'),
('Whiplash');
-- generate the tsvector column
UPDATE movies
SET tsv = to_tsvector(title);
-- Search for "good"
SELECT title
FROM movies, to_tsquery('good') as q
WHERE tsv @@ q;
PG12+ supports generated columns which can keep the tsvector
column updated automatically. A GIN
index can be added to speed up the search but it's not necessary for small tables.
The to_tsquery
function with a single word will perform an exact match, but it can also be used to perform a prefix search by appending :*
to the end of the word, or a phrase search by using the &
operator to join multiple words together. All of these can be combined to create a fuzzy search query.
SELECT title
FROM movies, to_tsquery('good:* & bad:*') as q
WHERE tsv @@ q
Instead of manually constructing this search string, it can be automatically normalized by using to_tsvector
and combining the terms with some string functions:
SELECT string_agg(lexeme || ':*', ' & ' order by positions)
FROM unnest(to_tsvector('good bad'))
This SQL gets the normalized text from the to_tsvector
function, splits it into individual tokens, appends :*
to each, then joins them with &
to create a single string. The example above takes good bad
and produces good:* & bad:*
which you can then use directly with to_tsquery
to get fuzzy matching with the normalization intact.
Combine it all into a CTE so it can be called easily as stored procedure or function:
WITH search AS (
SELECT to_tsquery(string_agg(lexeme || ':*', ' & ' order by positions)) AS query
FROM unnest(to_tsvector('enter your search query here'))
)
SELECT title
FROM movies, search
WHERE movies.tsv @@ search.query