Skip to content

repository: add obj_offset/obj_size range-load params to Store.load calls#9744

Open
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-step6-range-load
Open

repository: add obj_offset/obj_size range-load params to Store.load calls#9744
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:pack-files-step6-range-load

Conversation

@mr-raj12

@mr-raj12 mr-raj12 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

get() now accepts obj_offset=0 and obj_size=None and forwards them to all store.load() calls. The values come from PackWriter's flush results; once update_pack_info() has written them into the chunk index, callers can fetch a single chunk out of a multi-chunk pack without loading the whole file.

When obj_size is set, the read_data=False path caps the initial 1 KB load to obj_size so the store is not asked for more bytes than the object contains. The retry load is also capped as a guard against a corrupted meta_size in the header; for a well-formed object the clamp is a no-op.

get_many() gets a TODO comment: once N>1 packs land, it and fetch_many will need per-id (pack_id, obj_offset, obj_size) tuples to range-load each chunk.

Changes:

  • repository.py: get() takes obj_offset=0, obj_size=None; both read_data branches pass offset=/size= to store.load().
  • testsuite/repository_test.py: test_get_with_range writes a two-chunk pack via store_store, then range-loads each chunk by offset and size.

refs #8572

Checklist

  • PR is against master
  • New code has tests and docs where appropriate
  • Tests pass
  • Commit messages are clean and reference related issues

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.72%. Comparing base (35eff1c) to head (e4b4f47).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/repository.py 40.00% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9744      +/-   ##
==========================================
- Coverage   84.76%   84.72%   -0.05%     
==========================================
  Files          92       92              
  Lines       15047    15053       +6     
  Branches     2250     2252       +2     
==========================================
- Hits        12755    12754       -1     
- Misses       1592     1597       +5     
- Partials      700      702       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Comment thread src/borg/repository.py
Comment thread src/borg/repository.py Outdated
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from 0e81162 to 2393263 Compare June 9, 2026 17:22
@ThomasWaldmann

Copy link
Copy Markdown
Member

From the PR comment:

The read_data=False path clamps both load sizes to obj_size when set. Right now with N=1 packs this changes nothing (one chunk per file), but once N>1 packs land an unclamped size would overshoot into the next chunk,
so the clamp goes in now.

This is rather confusing.

For the read_data=False path, it first reads 1KB (assuming that this usually contains the header and all the metadata), an overshoot into next object is no problem as the parse_meta function will only read the metadata using the correct length from the header and will ignore the trailing bytes.

If meta_size in the header tells that we did not read enough data, we do a 2nd attempt, this time with exactly to correct size for what we need.

The whole point of doing it like this is to avoid just reading the header (few bytes) and then having to do another read for just the few bytes of metadata. 2x latency.

@mr-raj12 mr-raj12 changed the title repository: add obj_offset/obj_size range-load params to get() repository: add obj_offset/obj_size range-load params to Store.load calls Jun 9, 2026
retry_size min() guards against corrupted meta_size; no-op for healthy objects.
get_many TODO names fetch_many as also affected; ChunkIndex ownership listed as one option.
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from 2393263 to e4b4f47 Compare June 9, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants