Configuration index

Jaroslav_Zeman · February 19, 2022, 10:19am

Hi all

I’m a complete beginner and I have the first question about creating index.

I have some texts, I can theoretically divide the text into several logical parts (e.g.: name, tags, text, author, content, index). And now to the question. Should I store the text in the index together or as separate items? I have a general search string, e.g.: “SQL engine speed search”. How does the overall layout of the indexed document affect? Is searching in divided sections faster? (But must search over all its items because I don’t know what the text contains). Are there any rules for creating indexes in terms of size and speed? Or do I have to follow a trial-and-error system? I don’t need any rules to give the text found in the “tag” part more weight than elsewhere. Just find it.

Second question: Does the order of the words in the search string have any effect on any internal match index and order result? I don’t want to prioritize word order and phrases will not be searched too. Do I have to do anything about it in the configuration file?

Last question. Is there a way to find a text in which there is in principle an inaccuracy (there is an anagram in the word, a character is missing, or a character resides)? For example, I take the word ““address”, and this word will be written only as “adres”. Can I force finding the wrong word without a lemizator and others utils? How to set this in indexer?

Thanks for the answers

Jaroslav

tomat · February 19, 2022, 10:23am

it could be better to play with our interactive courses https://play.manticoresearch.com/

About full-text search

https://play.manticoresearch.com/fulltextintro/

About auto-correction Did you mean?

and many others to get in touch with manticore

Sergey · February 19, 2022, 10:30am

Should I store the text in the index together or as separate items?

If you can split your text into multiple logical parts and therefore store in multiple fields - do it. In most cases it’s better than storing all in the same field.

Does the order of the words in the search string have any effect on any internal match index and order result?

It doesn’t as long as it’s not a phrase search.

Last question. Is there a way to find a text in which there is in principle an inaccuracy (there is an anagram in the word, a character is missing, or a character resides)? For example, I take the word ““address”, and this word will be written only as “adres”. Can I force finding the wrong word without a lemizator and others utils? How to set this in indexer?

Not out of the box, but it’s usually not difficult to implement it the way you like in application using the existing tools:

quorum/proximity search
CALL SUGGEST

Please read the manual https://manual.manticoresearch.com/. And as @tomat said, we have interactive courses that are helpful for beginners - https://play.manticoresearch.com/

Jaroslav_Zeman · February 19, 2022, 10:55am

Auto-correction I can’t use it, because I set that as an example in the question. In my case, it’s more about finding numbers, such as production numbers.

From the itro to search, I understood that I would use Fuzzy seach to search for words without a sequence. Thank …

Jaroslav

Sergey · February 23, 2022, 8:12am

Perhaps ignore_chars can be helpful, e.g.:

mysql> drop table if exists t; create table t(f text) charset_table='0..9' ignore_chars='U+20,-'; insert into t values(0, '8 993-126-53-85'); select * from t where match('8-9931265385'); select * from t where match('8 993-126-5385');
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
create table t(f text) charset_table='0..9' ignore_chars='U+20,-'
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into t values(0, '8 993-126-53-85')
--------------

Query OK, 1 row affected (0.00 sec)

--------------
select * from t where match('8-9931265385')
--------------

+---------------------+-----------------+
| id                  | f               |
+---------------------+-----------------+
| 1514698463906889740 | 8 993-126-53-85 |
+---------------------+-----------------+
1 row in set (0.00 sec)

--------------
select * from t where match('8 993-126-5385')
--------------

+---------------------+-----------------+
| id                  | f               |
+---------------------+-----------------+
| 1514698463906889740 | 8 993-126-53-85 |
+---------------------+-----------------+
1 row in set (0.00 sec)