Skip to main content

Module ingestion

Module ingestion 

Source
Expand description

Ingestion entrypoints and implementations.

Most callers should use ingest_from_path (from unified) which:

For append-only ordered batches, use ingest_from_ordered_paths (concatenate rows, then apply the watermark filter once). For stable directory listings, see paths_from_directory_scan and partition module docs on deterministic ordering.

For ergonomic configuration, prefer IngestionOptionsBuilder over constructing IngestionOptions directly.

Format-specific functions are also available under:

Re-exports§

pub use builder::IngestionOptionsBuilder;
pub use delta_lake::delta_table_uri;
pub use delta_lake::write_dataset_to_delta_table;
pub use file_transfer::file_transfer_scheme;
pub use file_transfer::ingest_from_file_transfer_uri;
pub use file_transfer::is_file_transfer_uri;
pub use object_store::export_dataset_to_object_store_uri;
pub use object_store::ingest_from_object_store_uri;
pub use observability::CompositeObserver;
pub use observability::FileObserver;
pub use observability::IngestionContext;
pub use observability::IngestionObserver;
pub use observability::IngestionSeverity;
pub use observability::IngestionStats;
pub use observability::StdErrObserver;
pub use partition::PartitionSegment;
pub use partition::PartitionedFile;
pub use partition::discover_hive_partitioned_files;
pub use partition::hive_segments_for_relative_parent;
pub use partition::parse_partition_segment;
pub use partition::paths_from_directory_scan;
pub use partition::paths_from_explicit_list;
pub use partition::paths_from_glob;
pub use snowflake::copy_into_table_from_stage;
pub use snowflake::write_dataset_to_snowflake_stage;
pub use unified::ExcelSheetSelection;
pub use unified::IngestionFormat;
pub use unified::IngestionOptions;
pub use unified::IngestionRequest;
pub use unified::OrderedBatchIngestMetadata;
pub use unified::export_dataset_to_arrow_ipc;
pub use unified::export_dataset_to_parquet;
pub use unified::export_dataset_to_xml;
pub use unified::infer_schema_from_path;
pub use unified::ingest_from_ordered_paths;
pub use unified::ingest_from_path;
pub use unified::ingest_from_path_infer;
pub use watermark::apply_watermark_after_ingest;
pub use watermark::apply_watermark_filter;
pub use watermark::max_value_in_column;
pub use watermark::validate_watermark_config;
pub use db::ingest_from_db;
pub use db::ingest_from_db_infer;

Modules§

builder
csv
CSV ingestion implementation.
db
Direct DB ingestion stubs when db_connectorx is disabled.
delta_lake
excel
Excel ingestion stubs when the excel feature is disabled.
file_transfer
json
JSON ingestion implementation.
object_store
observability
parquet
Parquet ingestion implementation.
partition
Hive-style partition path discovery and helpers to resolve glob patterns or explicit file lists — single-process only (no distributed coordinator).
snowflake
Snowflake load: write Parquet to a stage URI (S3 / GCS / ABFS / file://), then optional COPY INTO.
unified
Unified ingestion entrypoint.
watermark
High-water / incremental row filter applied after ingest (file or DB).
xml
Row-oriented XML interchange (<rdp_records><record>…</record></rdp_records>).