Skip to main content

Module ingestion

Module ingestion 

Source
Expand description

Ingestion entrypoints and implementations.

Most callers should use ingest_from_path (from unified) which:

For append-only ordered batches, use ingest_from_ordered_paths (concatenate rows, then apply the watermark filter once). For stable directory listings, see paths_from_directory_scan and partition module docs on deterministic ordering.

For ergonomic configuration, prefer IngestionOptionsBuilder over constructing IngestionOptions directly.

Format-specific functions are also available under:

Re-exports§

pub use builder::IngestionOptionsBuilder;
pub use observability::CompositeObserver;
pub use observability::FileObserver;
pub use observability::IngestionContext;
pub use observability::IngestionObserver;
pub use observability::IngestionSeverity;
pub use observability::IngestionStats;
pub use observability::StdErrObserver;
pub use partition::PartitionSegment;
pub use partition::PartitionedFile;
pub use partition::discover_hive_partitioned_files;
pub use partition::hive_segments_for_relative_parent;
pub use partition::parse_partition_segment;
pub use partition::paths_from_directory_scan;
pub use partition::paths_from_explicit_list;
pub use partition::paths_from_glob;
pub use unified::ExcelSheetSelection;
pub use unified::IngestionFormat;
pub use unified::IngestionOptions;
pub use unified::IngestionRequest;
pub use unified::OrderedBatchIngestMetadata;
pub use unified::infer_schema_from_path;
pub use unified::ingest_from_ordered_paths;
pub use unified::ingest_from_path;
pub use unified::ingest_from_path_infer;
pub use watermark::apply_watermark_after_ingest;
pub use watermark::apply_watermark_filter;
pub use watermark::max_value_in_column;
pub use watermark::validate_watermark_config;
pub use db::ingest_from_db;
pub use db::ingest_from_db_infer;

Modules§

builder
csv
CSV ingestion implementation.
db
Direct DB ingestion stubs when db_connectorx is disabled.
excel
Excel ingestion stubs when the excel feature is disabled.
json
JSON ingestion implementation.
observability
parquet
Parquet ingestion implementation.
partition
Hive-style partition path discovery and helpers to resolve glob patterns or explicit file lists — single-process only (no distributed coordinator).
unified
Unified ingestion entrypoint.
watermark
High-water / incremental row filter applied after ingest (file or DB).