I’m a complete beginner and I have the first question about creating index.
I have some texts, I can theoretically divide the text into several logical parts (e.g.: name, tags, text, author, content, index). And now to the question. Should I store the text in the index together or as separate items? I have a general search string, e.g.: “SQL engine speed search”. How does the overall layout of the indexed document affect? Is searching in divided sections faster? (But must search over all its items because I don’t know what the text contains). Are there any rules for creating indexes in terms of size and speed? Or do I have to follow a trial-and-error system? I don’t need any rules to give the text found in the “tag” part more weight than elsewhere. Just find it.
Second question: Does the order of the words in the search string have any effect on any internal match index and order result? I don’t want to prioritize word order and phrases will not be searched too. Do I have to do anything about it in the configuration file?
Last question. Is there a way to find a text in which there is in principle an inaccuracy (there is an anagram in the word, a character is missing, or a character resides)? For example, I take the word ““address”, and this word will be written only as “adres”. Can I force finding the wrong word without a lemizator and others utils? How to set this in indexer?
Should I store the text in the index together or as separate items?
If you can split your text into multiple logical parts and therefore store in multiple fields - do it. In most cases it’s better than storing all in the same field.
Does the order of the words in the search string have any effect on any internal match index and order result?
It doesn’t as long as it’s not a phrase search.
Last question. Is there a way to find a text in which there is in principle an inaccuracy (there is an anagram in the word, a character is missing, or a character resides)? For example, I take the word ““address”, and this word will be written only as “adres”. Can I force finding the wrong word without a lemizator and others utils? How to set this in indexer?
Not out of the box, but it’s usually not difficult to implement it the way you like in application using the existing tools:
Auto-correction I can’t use it, because I set that as an example in the question. In my case, it’s more about finding numbers, such as production numbers.
From the itro to search, I understood that I would use Fuzzy seach to search for words without a sequence. Thank …
mysql> drop table if exists t; create table t(f text) charset_table='0..9' ignore_chars='U+20,-'; insert into t values(0, '8 993-126-53-85'); select * from t where match('8-9931265385'); select * from t where match('8 993-126-5385');
--------------
drop table if exists t
--------------
Query OK, 0 rows affected (0.00 sec)
--------------
create table t(f text) charset_table='0..9' ignore_chars='U+20,-'
--------------
Query OK, 0 rows affected (0.01 sec)
--------------
insert into t values(0, '8 993-126-53-85')
--------------
Query OK, 1 row affected (0.00 sec)
--------------
select * from t where match('8-9931265385')
--------------
+---------------------+-----------------+
| id | f |
+---------------------+-----------------+
| 1514698463906889740 | 8 993-126-53-85 |
+---------------------+-----------------+
1 row in set (0.00 sec)
--------------
select * from t where match('8 993-126-5385')
--------------
+---------------------+-----------------+
| id | f |
+---------------------+-----------------+
| 1514698463906889740 | 8 993-126-53-85 |
+---------------------+-----------------+
1 row in set (0.00 sec)