ERROR: table raw_hits: write error - indexer cannot finish job

johna · August 20, 2023, 9:10pm

Hi,
I have finally make indexer work for a plain index, a huge 100GO 70M rows from mysql.
The indexer start doing fine , but at 58M rows, it show this error above :

“ERROR: table ‘idx’: raw_hits: write error: 1730 of 1048390 bytes written.”

My server is a debian 32 GO ram, 250MB bandwith

Any advices or solutions ?

barryhunter · August 21, 2023, 10:38am

Sounds like full hard disk.

Note that generally indexer needs about 3x diskspace than final data/index size. It stores a bunch of temporally files.

Seems like it might need 300Gb to be able to build the index.

johna · August 21, 2023, 1:14pm

Thanks, i will try on another server, to see if it works.

johna · August 21, 2023, 11:19pm

I hav relaunched the index creation on a biger server, i hav a new error :

ERROR: table 'idx: error opening ‘/var/lib/manticore/data/idx.tmp.spidx.0.tmp’: Too many open files.

I have tried by deleting all files in the folder…no way for the indexing operation to success.
Any clues / advices ?

tomat · August 22, 2023, 9:28am

could you provide your index schema ?
there could be a lot of attributes these creates many temp files and it worth to increase fd limit for user starts indexer

please also provide output of ulimit -a

johna · August 22, 2023, 10:23am

Sure,

for the limits :

real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 241024
max locked memory (kbytes, -l) 7716789
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 241024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

for the SQL query :

Select ID, MEDIA, MEDIA_TYPE, BLOG_FOLLOWERS, LIKES, COMMENTS, ENG, MEDIA_DESC, EDITEUR, MEDIA_URL, DATE_AJOUT, CATEGORIE, KEYWORDS, IS_PREMIUM, URL, TITLE, IMG, AUTEUR, DATE_PUBLI, DATE_PUBLI_FAST, RESUME, ID_KEY, ANNEE, MOIS_NUM, PAYS, REGION, VILLE, LANGUE, HEURE, MOIS, FUSEAU_DELTA, HEURE_SEC, URL_LOGO_OK, LOGO_FILE, FACE_BOOK_LIKES, LINKEDIN_FOLLOWERS, INSTA_ABONNES, TWITTER_NB_ABONNES, PINTEREST_FOLLOWERS, YOUTUBE_ABONNES, YOUTUBE_CHANNEL_TOTAL_VIEWS, SIMILAR_CAT, SIMILAR_CAT_FR, SIMILAR_LAST_VISITS, SIMILAR_RANK_COUNTRY, SIMILAR_RANK_WORLD, SIMILAR_RANK_COUNTRY_NAME, SIMILAR_RANK_CATEGORY, TIKTOK_FOLLOWERS, TIKTOK_TOTAL_LIKE, TIKTOK_NB_VIDEOS, INSTAGRAM, IS_CMS, CMS, SIMILAR_DESC, TWITTER_PINED_TWEET_TXT, TWITTER_OK_PIC_LOGO_URL, SERVER_PAYS, SERVER_REGION, SERVER_VILLE, GPS_LAT, GPS_LON, PERSONS, LOCALITY, FACEBOOK, LINKEDIN, TWITTER, YOUTUBE, TIKTOK, COLORS, PAYS_EN, PAYS_FR, PAYS_ES, PAYS_GE, PAYS_IT, LANGUE_CODE, LANGUE_EN, LANGUE_FR, LANGUE_ES, LANGUE_GE, LANGUE_IT, MEDIA_FORMAT, MEDIA_AUDIENCE_PAYS_FR, MEDIA_AUDIENCE_PAYS_EN, MEDIA_AUDIENCE_PAYS_ES, MEDIA_AUDIENCE_PAYS_GE, MEDIA_AUDIENCE_PAYS_IT, MEDIA_CATEGORIE, ADMIN_ETAT_REGION, ADMIN_VILLE, EMAIL_STUDIO, OWNER, BROADCAST_RADIO_FORMAT, TEL_RADIO_RECEPTION, TEL_STUDIO, TEL_SUBSCRIPTION, DATE_CREATION, DATE_CREATION_ANNEE, AUDIENCE_PAYS from myDB

for attributes :

sql_attr_bigint = id
sql_field_string = MEDIA
sql_field_string = EDITEUR
sql_field_string = MEDIA_URL
sql_field_string = DATE_AJOUT
sql_field_string = CATEGORIE
sql_field_string = KEYWORDS
sql_field_string = IS_PREMIUM
sql_field_string = URL
sql_field_string = TITLE
sql_field_string = MEDIA_TYPE
sql_field_string = SIMILAR_DESC
sql_field_string = TWITTER_PINED_TWEET_TXT
sql_field_string = TWITTER_OK_PIC_LOGO_URL
sql_field_string = SERVER_PAYS
sql_field_string = SERVER_REGION
sql_field_string = SERVER_VILLE
sql_field_string = GPS_LAT
sql_field_string = GPS_LON
sql_field_string = PERSONS
sql_field_string = LOCALITY
sql_field_string = FACEBOOK
sql_field_string = LINKEDIN
sql_field_string = TWITTER
sql_field_string = YOUTUBE
sql_field_string = TIKTOK
sql_field_string = COLORS
sql_field_string = PAYS_EN
sql_field_string = PAYS_FR
sql_field_string = PAYS_ES
sql_field_string = PAYS_GE
sql_field_string = PAYS_IT
sql_field_string = LANGUE_CODE
sql_field_string = LANGUE_EN
sql_field_string = LANGUE_FR
sql_field_string = LANGUE_ES
sql_field_string = LANGUE_GE
sql_field_string = LANGUE_IT
sql_field_string = MEDIA_FORMAT
sql_field_string = MEDIA_AUDIENCE_PAYS_FR
sql_field_string = MEDIA_AUDIENCE_PAYS_EN
sql_field_string = MEDIA_AUDIENCE_PAYS_ES
sql_field_string = MEDIA_AUDIENCE_PAYS_GE
sql_field_string = MEDIA_AUDIENCE_PAYS_IT
sql_field_string = MEDIA_CATEGORIE
sql_field_string = ADMIN_ETAT_REGION
sql_field_string = ADMIN_VILLE
sql_field_string = EMAIL_STUDIO
sql_field_string = OWNER
sql_field_string = BROADCAST_RADIO_FORMAT
sql_field_string = TEL_RADIO_RECEPTION
sql_field_string = TEL_STUDIO
sql_field_string = TEL_SUBSCRIPTION
sql_field_string = DATE_CREATION
sql_field_string = DATE_CREATION_ANNEE
sql_field_string = AUDIENCE_PAYS

for searchd :

searchd
{

log = /var/log/manticore/searchd.log
query_log = /var/log/manticore/query.log
pid_file = /var/run/manticore/searchd.pid

query_log_format = sphinxql

}

How i can force de the engine to over pass this bug ? Thanks guys for your precious help.

Sergey · August 25, 2023, 3:23am

may be too low for so many attributes since during indexing each attribute leads to a separate temporary file. Increase the open files limit. Manticore Search Manual: Server settings > Searchd can help, but you may also have to increase the hard limit - How to Increase Number of Open Files Limit in Linux

johna · September 8, 2023, 12:06pm

Thanks a lot for your help