How to use Solr MinHashQParser

How to use Solr MinHashQParser

Problem Description:

Currently I’m trying to integrate Jaccard similarity search using MinHash and I stumbled upon solr’s 8.11 MinHash Query Parser and it says in the docs:

The queries measure Jaccard similarity between the query string and MinHash fields

How to correctly implement it?

As docs say, I added <fieldType> and <field> like so:

<field name="min_hash_analysed" type="text_min_hash" multiValued="false" indexed="true" stored="false" />

<fieldType name="text_min_hash" class="solr.TextField" positionIncrementGap="100">
            <tokenizer class="solr.ICUTokenizerFactory"/>
            <filter class="solr.ICUFoldingFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" minShingleSize="5" outputUnigrams="false" outputUnigramsIfNoShingles="false" maxShingleSize="5" tokenSeparator=" "/>
            <filter class="org.apache.lucene.analysis.minhash.MinHashFilterFactory" bucketCount="512" hashSetSize="1" hashCount="1"/>

I tired saving some text to that new min_hash_analysed field and then trying to query very similar text using query provided in the doc.

{!min_hash field="min_hash_analysed" sim="0.5" tp="0.5"}Very similar text to already saved document text

I was hoping to get back all documents that have higher similarity score than sim="0.5", but no matter what I get "numFound":0

Solr query result

Surely I’m doing some thing wrong. How should I correctly integrate Solr’s MinHash Query Parser?

Solution – 1

According to the response it seems you’re sending {!min_hash field..} directly as a query parameter, not as a Solr query as given by the the q= parameter.

q={!min_hash ..}query text here 

.. would be the correct syntax in the URL (and apply URL escaping as required).

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.