There are several ways to fuzzy search in Postgres. There's the
ILIKE operator for case-insensitive searching, or the
pg_trgm extension for trigram matching for more precise similarity comparisons. However
ILIKE is slow because it scans the entire table, and
pg_trgm isn't very accurate, and neither support multiple words or phrases.
The best way to do a fuzzy search in PostgreSQL is to use the full-text search utilities. They're fast, built-in, and don't require any extra extensions. It's also much more powerful than the other options and can fuzzy search with stemming and other normalization included.
Full-text search requires
tsvector columns which are generated from the text you want to search, using the
to_tsvector function which takes a string and returns a
tsvector object. Queries use the
@@ operator to compare that column with a
ts_query object generated by the
Here's an example:
-- create a table with a tsvector column CREATE TABLE movies (title text, tsv tsvector); -- insert sample data INSERT INTO movies (title) VALUES ('12 Angry Men'), ('American Beauty'), ('American History X'), ('Back to the Future'), ('Casablanca'), ('City of God'), ('Fight Club'), ('Forrest Gump'), ('Gladiator'), ('Good Will Hunting'), ('Goodfellas'), ('Inception'), ('Interstellar'), ('Jurassic Park'), ('Life Is Beautiful'), ('Once Upon a Time in America'), ('Pulp Fiction'), ('Saving Private Ryan'), ('Spirited Away'), ('Star Wars: A New Hope'), ('Terminator 2: Judgment Day'), ('The Dark Knight'), ('The Godfather'), ('The Good, the Bad and the Ugly'), ('The Green Mile'), ('The Incredibles'), ('The Lion King'), ('The Lord of the Rings: The Return of the King'), ('The Matrix'), ('The Shawshank Redemption'), ('The Usual Suspects'), ('Whiplash'); -- generate the tsvector column UPDATE movies SET tsv = to_tsvector(title); -- Search for "good" SELECT title FROM movies, to_tsquery('good') as q WHERE tsv @@ q;
PG12+ supports generated columns which can keep the
tsvector column updated automatically. A
GIN index can be added to speed up the search but it's not necessary for small tables.
to_tsquery function with a single word will perform an exact match, but it can also be used to perform a prefix search by appending
:* to the end of the word, or a phrase search by using the
& operator to join multiple words together. All of these can be combined to create a fuzzy search query.
SELECT title FROM movies, to_tsquery('good:* & bad:*') as q WHERE tsv @@ q
Instead of manually constructing this search string, it can be automatically normalized by using
to_tsvector and combining the terms with some string functions:
SELECT string_agg(lexeme || ':*', ' & ' order by positions) FROM unnest(to_tsvector('good bad'))
This SQL gets the normalized text from the
to_tsvector function, splits it into individual tokens, appends
:* to each, then joins them with
& to create a single string. The example above takes
good bad and produces
good:* & bad:* which you can then use directly with
to_tsquery to get fuzzy matching with the normalization intact.
Combine it all into a CTE so it can be called easily as stored procedure or function:
WITH search AS ( SELECT to_tsquery(string_agg(lexeme || ':*', ' & ' order by positions)) AS query FROM unnest(to_tsvector('enter your search query here')) ) SELECT title FROM movies, search WHERE movies.tsv @@ search.query