Is there a way to query just the RAM chunk (all segments)?

barryhunter · August 27, 2022, 12:46pm

It seems there is a somewhat hidden syntax for SELECT COUNT(*) FROM index.0 to just search the first disk chunk. Once got past the disk chucks it seems to go to each RAM segment.

Don’t seem to be able to do select count(*) from gridimage_group_stat.2, gridimage_group_stat.3 (assuming two disk, and two segments)

For background, wondering if going into be feasible to ‘auto-shard’ a RT index across nodes. Eg the helm chart current uses all worker nodes as mirrors. But conceptually it seems like it possible that it could search different shards on each node (assuming exactly three shards/segments)

agent = node0:9312:index.0
agent = node1:9312:index.1
agent = node2:9312:index.2

But there can be upto 32 segments (AFAIK) and could be particularly fast changing (when the node a decides to optimize segments), so it would be nice to just treat the ram chunk as one entity.

conceptually SELECT COUNT(*) FROM index.ram or even SELECT COUNT(*) FROM index.2-27 (assuming 25 segments!)

Although the issue can’t search multiple shards

select count(*) from gridimage_group_stat.1, gridimage_group_stat.0
ERROR 1064 (42000): sphinxql: syntax error, unexpected ',', expecting $end near ', gridimage_group_stat.0'

Does suggest the whole idea falls apart anyway once more shards than nodes. ie wouldn’t be able to do

agent = node0:9312:index.0,index.3
agent = node1:9312:index.1,index.4
agent = node2:9312:index.2

anyway - although I don’t even know if the ‘index.shard’ syntax words in ‘agent’ definition anyway.

tomat · August 28, 2022, 8:52pm

you could enumerate all disk chunks and segments at the index name, ie

select * from index.0.1.2.3.4.5

and the disk vs chunk selection code at the FilterReaderChunks function

	auto iDiskBound = tOrigin.m_pChunks->GetLength();
	auto iAllBound = iDiskBound + tOrigin.m_pSegs->GetLength();
	dOrderedChunks.any_of ( [&] ( int iVal )
	{
		if ( iVal<iDiskBound )
		{
			++iDiskSelected;
			return false;
		}
		if ( iVal<iAllBound )
		{
			++iRamSelected;
			return false;
		}
		return true;
	});

but you can not set range there, only exact numbers

index after iundex name 0 up to disk_chunks will select disk chunk then particular segment up to ram_chunk_segments_count

However that was developed not for production but only for debug as disk chunks count got changed via optimize and RAM segments count also got changed on every insert.

barryhunter · August 29, 2022, 11:52am

Thats for the index.0.1.2 syntax, that useful to know (even jsut for debuging

The particularly fast changing nature of the segments, which is why hoping would be able to just reference the entire ram as one. Comparatively the disk chunks change less often.

But agree still issues using this in ‘production’, not least dealing with ‘sync’. (ie keeping the distributed index in sync with all the agents), but also imagine accessing chunks diretly would be bypassing any effect of the killlist etc from deleting documsnts. Even so it might be possible with an index that doesnt see deletes (only inserts, updates)