repository: add obj_offset/obj_size range-load params to Store.load calls#9744
repository: add obj_offset/obj_size range-load params to Store.load calls#9744mr-raj12 wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9744 +/- ##
==========================================
- Coverage 84.76% 84.72% -0.05%
==========================================
Files 92 92
Lines 15047 15053 +6
Branches 2250 2252 +2
==========================================
- Hits 12755 12754 -1
- Misses 1592 1597 +5
- Partials 700 702 +2 ☔ View full report in Codecov by Harness. |
0e81162 to
2393263
Compare
|
From the PR comment:
This is rather confusing. For the read_data=False path, it first reads 1KB (assuming that this usually contains the header and all the metadata), an overshoot into next object is no problem as the parse_meta function will only read the metadata using the correct length from the header and will ignore the trailing bytes. If meta_size in the header tells that we did not read enough data, we do a 2nd attempt, this time with exactly to correct size for what we need. The whole point of doing it like this is to avoid just reading the header (few bytes) and then having to do another read for just the few bytes of metadata. 2x latency. |
retry_size min() guards against corrupted meta_size; no-op for healthy objects. get_many TODO names fetch_many as also affected; ChunkIndex ownership listed as one option.
2393263 to
e4b4f47
Compare
Description
get()now acceptsobj_offset=0andobj_size=Noneand forwards them to allstore.load()calls. The values come fromPackWriter's flush results; onceupdate_pack_info()has written them into the chunk index, callers can fetch a single chunk out of a multi-chunk pack without loading the whole file.When
obj_sizeis set, theread_data=Falsepath caps the initial 1 KB load toobj_sizeso the store is not asked for more bytes than the object contains. The retry load is also capped as a guard against a corruptedmeta_sizein the header; for a well-formed object the clamp is a no-op.get_many()gets a TODO comment: once N>1 packs land, it andfetch_manywill need per-id(pack_id, obj_offset, obj_size)tuples to range-load each chunk.Changes:
repository.py:get()takesobj_offset=0, obj_size=None; bothread_databranches passoffset=/size=tostore.load().testsuite/repository_test.py:test_get_with_rangewrites a two-chunk pack viastore_store, then range-loads each chunk by offset and size.refs #8572
Checklist
master