Skip to content

test: Add missing test coverage for chunking module edge cases #430

@vedssharma

Description

@vedssharma

Problem

tests/chunking_test.py has no coverage for several code paths in chunking.py:

  • Guard clauses in SentenceIterator.init (IndexError for negative or out-of-range curr_token_pos) are never exercised
  • create_token_interval, get_token_interval_text, and get_char_interval all have ValueError / TokenUtilError error paths with zero tests
  • ChunkIterator constructor edge cases (both text and document being None, text=None falling back to document.text, empty TokenizedText triggering re-tokenization) are untested
  • TextChunk.chunk_text and TextChunk.char_interval raise ValueError when document is None — untested
  • TextChunk.sanitized_chunk_text is entirely untested
  • The lazy caching of _chunk_text and _char_interval is never verified
  • make_batches_of_textchunk is only tested with one specific batch size
  • The broken_sentence flag reset — which controls whether subsequent sentences are merged after a mid-sentence chunk break — has no dedicated test

Proposed fix

Add tests covering all of the above in tests/chunking_test.py, organized into focused test classes per concern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions