Dynamic use of stopwords


we run a multi-language site (40+) with a search engine to search through millions of documents in various languages.

One of the problems we have in improving search results is that stop words are language specific: while in the indexer configuration file we list all the stopword files available when we do a search we probably want to only use the ones for a specific language.

What is a common/non specific stop word for a language can become a non-so common term in another.

Is there are way to do this ?

Thanks for any ideas or suggestions.


Stopwords is a filter on words that enter in the dictionary’s index, they don’t exists in the index (nothing that can be done at search time). You could have an index for each language (and put docs in indexes depending on their language) with it’s own stopword list or perform manual stopword filtering on the input search in application code (and don’t use stopwords in the index).