Expand description
Ingestion entrypoints and implementations.
Most callers should use ingest_from_path (from unified) which:
- auto-detects format by file extension (or you can override via
IngestionOptions) - performs ingestion into an in-memory
crate::types::DataSet - optionally reports success/failure/alerts to an
IngestionObserver
For append-only ordered batches, use ingest_from_ordered_paths (concatenate rows, then
apply the watermark filter once). For stable directory listings, see paths_from_directory_scan
and partition module docs on deterministic ordering.
For ergonomic configuration, prefer IngestionOptionsBuilder over constructing
IngestionOptions directly.
Format-specific functions are also available under:
Re-exports§
pub use builder::IngestionOptionsBuilder;pub use observability::CompositeObserver;pub use observability::FileObserver;pub use observability::IngestionContext;pub use observability::IngestionObserver;pub use observability::IngestionSeverity;pub use observability::IngestionStats;pub use observability::StdErrObserver;pub use partition::PartitionSegment;pub use partition::PartitionedFile;pub use partition::discover_hive_partitioned_files;pub use partition::hive_segments_for_relative_parent;pub use partition::parse_partition_segment;pub use partition::paths_from_directory_scan;pub use partition::paths_from_explicit_list;pub use partition::paths_from_glob;pub use unified::ExcelSheetSelection;pub use unified::IngestionFormat;pub use unified::IngestionOptions;pub use unified::IngestionRequest;pub use unified::OrderedBatchIngestMetadata;pub use unified::infer_schema_from_path;pub use unified::ingest_from_ordered_paths;pub use unified::ingest_from_path;pub use unified::ingest_from_path_infer;pub use watermark::apply_watermark_after_ingest;pub use watermark::apply_watermark_filter;pub use watermark::max_value_in_column;pub use watermark::validate_watermark_config;pub use db::ingest_from_db;pub use db::ingest_from_db_infer;
Modules§
- builder
- csv
- CSV ingestion implementation.
- db
- Direct DB ingestion stubs when
db_connectorxis disabled. - excel
- Excel ingestion stubs when the
excelfeature is disabled. - json
- JSON ingestion implementation.
- observability
- parquet
- Parquet ingestion implementation.
- partition
- Hive-style partition path discovery and helpers to resolve glob patterns or explicit file lists — single-process only (no distributed coordinator).
- unified
- Unified ingestion entrypoint.
- watermark
- High-water / incremental row filter applied after ingest (file or DB).