Forums » Ferret » Recalculating the score

Recalculating the score
Posted by Benjamin Krause (Guest)
on 04.07.2006 15:19
Hey ..

I'm using ferret to index various objects and i'm create a 
Ferret::Document for each of these objects. Indexing and searching is 
working fine.

Each of these Ferret::Documents has a 'relevance' field, storing an 
integer, how relevant this object is for the search. The 'relevance' is 
in the range of 1..10

Now i would like to multiply the relevance of the document with the 
score, and sort the results by that.

e.g.:
A document with a score of 0.82 and a relevance of 3 should have a final 
score of 2.46

I couldn't figure out how to do this ..

I've read the 'Balancing relevancy and recentness' thread..

>      score = yield( doc, score ) if block_given?
>
> This allows a block attached to a search call to adjust
> document scores before documents are sorted, based on
> some (possibly dynamic) numerical factors associated
> with the document, e.g. the number and importance

i guess this works for the pure ruby implementation but won't work for 
the c-implementation?

> As long as Ferret does what Lucene does with boosts, you could scale
> document boosts at indexing time by some factor related to age and
> that will factor into scoring.  

Boost won't help me here, i've even set the boost value for relevance to 
0.0, as it should not be part of the query..

Is there any way on how to recaluclate the score?

Thanks,
 Ben
Re: Recalculating the score
Posted by David Balmain (Guest)
on 06.07.2006 05:56
On 7/4/06, Benjamin Krause <bk@benjaminkrause.com> wrote:
> Now i would like to multiply the relevance of the document with the
> >      score = yield( doc, score ) if block_given?
> >
> > This allows a block attached to a search call to adjust
> > document scores before documents are sorted, based on
> > some (possibly dynamic) numerical factors associated
> > with the document, e.g. the number and importance
>
> i guess this works for the pure ruby implementation but won't work for
> the c-implementation?

Hi Ben,
You are right, this is only possible in the pure ruby version. A more
flexible framework for sorting will be coming in the future but
currently you can only sort by integer, float, string, doc_id, and
relevance.

> > As long as Ferret does what Lucene does with boosts, you could scale
> > document boosts at indexing time by some factor related to age and
> > that will factor into scoring.
>
> Boost won't help me here, i've even set the boost value for relevance to
> 0.0, as it should not be part of the query..
>
> Is there any way on how to recaluclate the score?

How about setting the boost for the whole document rather than just
the :relevance field? Or do you sometimes want to sort by relevance
without taking the :relevance field into account?

Cheers,
Dave

PS: While we are on the topic, how would you like the sort API to
look? Many have complained that the sort API is too java-like but
no-one has suggested any improvements yet. I'd love to see some ideas.
Re: Recalculating the score
Posted by Benjamin Krause (Guest)
on 07.07.2006 19:23
Hey David,

thanks for the answer ..

> How about setting the boost for the whole document rather than just
> the :relevance field? Or do you sometimes want to sort by relevance
> without taking the :relevance field into account?

ah.. you mean i should boost each field of the document? or is there a 
way to set a boost level for the document as a whole? if so, i've missed 
it ..

> PS: While we are on the topic, how would you like the sort API to
> look? Many have complained that the sort API is too java-like but
> no-one has suggested any improvements yet. I'd love to see some ideas.

i like the idea of giving a short block with a sort algorithm.. i would 
like to see something like that:

index.search ( :query => my_query,
               :sort  => Proc.new( |doc| # some caluclation; return 
new_score ),
               :reverse => false,
               :filter => false,
               :start => 0,
               :limit => 10 )

alternativly you should be able to give the sort param a name of a 
filed, like ':sort => :score' or an array of fields like ':sort => [ 
:score, :title ]' and sort by the first element and then by the 2nd if 
the two or more docs share the same value for the 1st element.
I guess something like ":sort => :score" is enough for most people ..

i think the other options are almost like it is implemented right now .. 
i don't think you nee the SortField class.

btw.. i do find the filter API not really intuitive, actually i didn't 
understand it at all ;)

i know what you want to do with filters and how you want to get there, 
but i haven't found any understandable documentation, on how to build 
one ..

maybe you should write a short tutorial on how to write a filter.. i 
would find it very intuitive, to have something like a base_query.. like 
having one query to filter/limit results, and have another query to do 
the real search..

and btw.. one feature i would definitely would like to see is to limit 
the search on a number of fields..

i know i can write something like

field_one:"search string" || field_two:"search 
string||field_three:"search string"||field_four:"search string"

but i would like to be able to write something like

(field_one|field_two|field_three|field_four):"search string"

furthermore, you should be able to say something like .. search in all 
fields, except field_one .. like

(*|!field_one):"search string"

Ben
Re: Recalculating the score
Posted by David Balmain (Guest)
on 08.07.2006 01:02
On 7/8/06, Benjamin Krause <bk@benjaminkrause.com> wrote:
> it ..
doc = Ferret::Document::Document.new()
doc.boost = 100.0

>                :reverse => false,
>                :filter => false,
>                :start => 0,
>                :limit => 10 )

The way sort works at the moment is that it caches all fields that are
sorted on. If you start doing sort like this and you have to load
every document in the result set which would have a huge performance
hit. I guess I could make this feature available though.

In the pure ruby version of Ferret you can do this;

    st_length = SortField::SortType.new("length", lambda{|str| 
str.length})
    sf = SortField.new("content", {:sort_type => st_length,
                               :reverse => true,
                               :comparator => lambda{|i,j| j <=> i}})

The sort type lambda allows you to create the sort cache. Then the
comparator lets you compare those two values. This is flexible while
remaining performant, although I still think I can make it more
intuitive.

> alternativly you should be able to give the sort param a name of a
> filed, like ':sort => :score' or an array of fields like ':sort => [
> :score, :title ]' and sort by the first element and then by the 2nd if
> the two or more docs share the same value for the 1st element.
> I guess something like ":sort => :score" is enough for most people ..

Actually, you can already do this. Have you tried it? Only :score is
treated as a field name. You'd have to do this;

    index.search_each(query, :sort => [SortField::RELEVANCE, :title, 
:price])


> maybe you should write a short tutorial on how to write a filter.. i
> would find it very intuitive, to have something like a base_query.. like
> having one query to filter/limit results, and have another query to do
> the real search..

I will. The TermEnum and TermDocEnum are essential for using filters
and they've undergone major changes so I'll hold off on this until I
get the next release out.

> (field_one|field_two|field_three|field_four):"search string"
You can do this already, just get rid of the brackets;

    field_one|field_two|field_three|field_four:"search string"

> furthermore, you should be able to say something like .. search in all
> fields, except field_one .. like
>
> (*|!field_one):"search string"

You can't do this, but it is a nice idea. I'll think about it. I might
also add the brackets into the syntax.


Anyway, thanks for your feedback Ben. I will definitely use it.

Cheers,
Dave