Expand description

APIs to read from Parquet format.

Re-exports

pub use parquet2::fallible_streaming_iterator;
pub use schema::infer_schema;

Modules

APIs to handle Parquet <-> Arrow schemas.

APIs exposing parquet2’s statistics as arrow’s statistics.

Structs

Metadata for a column chunk.

A descriptor for leaf-level primitive columns. This encapsulates information such as definition and repetition levels and is used to re-assemble nested data.

A CompressedDataPage is compressed, encoded representation of a Parquet data page. It holds actual data and thus cloning it is expensive.

A DataPage is an uncompressed, encoded representation of a Parquet data page. It holds actual data and thus cloning it is expensive.

Decompressor that allows re-using the page buffer of PageIterator.

Metadata for a Parquet file.

An iterator of Chunks coming from row groups of a parquet file.

A page iterator iterates over row group’s pages. In parquet, pages are guaranteed to be contiguously arranged in memory and therefore must be read in sequence.

A MutStreamingIterator of pre-read column chunks

An Iterator of Chunk that (dynamically) adapts a vector of iterators of Array into an iterator of Chunk.

Metadata for a row group.

An [Iterator<Item=RowGroupDeserializer>] from row groups of a parquet file.

Timestamp logical type annotation

Enums

Errors generated by this crate

Representation of a Parquet type. Used to describe primitive leaf fields and structs, including top-level schema. Note that the top-level schema type is represented using GroupType whose repetition is None.

Traits

Trait describing a MutStreamingIterator of column chunks.

A fallible, streaming iterator.

Functions

Returns a new PageIterator by seeking reader to the begining of column_chunk.

Returns a stream of compressed data pages

Reads a file’s metadata.

An iterator adapter that maps multiple iterators of DataPages into an iterator of Arrays.

Decompresses the page, using buffer for decompression. If page.buffer.len() == 0, there was no decompression and the buffer was moved. Else, decompression took place.

Returns a ColumnIterator of column chunks corresponding to field. Contrarily to get_page_iterator that returns a single iterator of pages, this iterator returns multiple iterators, one per physical column of the field. For primitive fields (e.g. i64), ColumnIterator yields exactly one column. For complex fields, it yields multiple columns.

Creates a new iterator of compressed pages.

Reads all columns that are part of the parquet field field_name

Reads all columns that are part of the parquet field field_name

Returns a vector of iterators of Array (ArrayIter) corresponding to the top level parquet fields whose name matches fields’s names.

Returns a vector of iterators of Array corresponding to the top level parquet fields whose name matches fields’s names.

Reads parquets’ metadata syncronously.

Reads parquets’ metadata asynchronously.

Converts a vector of columns associated with the parquet field whose name is Field to an iterator of Array, ArrayIter of chunk size chunk_size.

Type Definitions

Type def for a sharable, boxed dyn Iterator of arrays

Type declaration for a page filter