Modules
Traits and utilities for temporal data.
Data types supported by Polars.
io_json
Convert data between the Arrow memory format and JSON line-delimited records.
io_json
APIs to read from and write to NDJSON
io_csv_write
APIs to write to CSV
Macros
Structs
A thread-safe reference-counting pointer. ‘Arc’ stands for ‘Atomically Reference Counted’.
Represents Arrow’s metadata of a “column”.
Specialized expressions for Categorical dtypes.
ChunkedArray
Create a new DataFrame by reading a csv file.
Write a DataFrame to csv.
A contiguous growable collection of Series
that have the same length.
Returned by a groupby operation on a DataFrame. This struct supports several aggregations.
Indexes of the groups, the first index is stored separately. this make sorting fast.
Read Arrows IPC format into a DataFrame
Write a DataFrame to Arrow’s IPC format
Lazy abstraction over an eager DataFrame
.
It really is an abstraction over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via collect)
Utility struct for lazy groupby operation.
Maps a logical type to a a chunked array implementation of the physical type. This saves a lot of compiler bloat and allows us to reuse functionality.
Wrapper type that indicates that the inner type is not equal to anything
Just a wrapper structure. Useful for certain impl specializations
This is for instance use to implement
impl<T> FromIterator<T::Native> for NoNull<ChunkedArray<T>>
as Option<T::Native>
was already implemented:
impl<T> FromIterator<Option<T::Native>> for ChunkedArray<T>
The literal Null
object
State of the allowed optimizations
Read Apache parquet format into a DataFrame.
Write a DataFrame to parquet format
Wrapper struct that allow us to use a PhysicalExpr in polars-io.
Series
This is logical type StructChunked
that
dispatches most logic to the fields
implementations
Intermediate state of when(..).then(..).otherwise(..)
expr.
Intermediate state of when(..).then(..).otherwise(..)
expr.
Intermediate state of chain when then exprs.
Represents a window in time
Enums
The set of supported logical types in this crate.
The time units defined in Arrow.
Queries consists of multiple expressions.
Compression codec
One of the three arguments allowed in unchecked_take
Constants
Traits
Argmin/ Argmax
Aggregation operations
Aggregations that return Series of unit length. Those can be used in broadcasting operations.
Fastest way to do elementwise operations on a ChunkedArray
Apply kernels on the arrow array chunks in a ChunkedArray.
Cast ChunkedArray<T>
to ChunkedArray<N>
Compare Series
and ChunkedArray’s and get a boolean
mask that
can be used to filter rows.
Create a new ChunkedArray filled with values at that index.
Explode/ flatten a
Replace None values with various strategies
Replace None values with a value
Filter values by a boolean mask.
Fill a ChunkedArray with one value.
Find local minima/ maxima
Quantile and median aggregation
Reverse a ChunkedArray
This differs from ChunkWindowCustom and ChunkWindow
by not using a fold aggregator, but reusing a Series
wrapper and calling Series
aggregators.
This likely is a bit slower than ChunkWindow
Create a ChunkedArray
with new values by index or by boolean mask.
Note that these operations clone data. This is however the only way we can modify at mask or
index level as the underlying Arrow arrays are immutable.
Shift the values of a ChunkedArray by a number of periods.
Sort operations on ChunkedArray
.
Fast access by index.
Traverse and collect every nth element
Get unique values in a ChunkedArray
Variance and standard deviation aggregation.
Combine 2 ChunkedArrays based on some predicate.
Executors will evaluate physical expressions and collect them in a DataFrame.
This trait exists to be unify the API of polars Schema and arrows Schema
Used to create the tuples for a groupby operation.
Create a type that implements a faster TakeRandom
.
is_first
Mask the first unique values as true
is_in
Check if element is member of list array
is_first
Mask the last unique values as true
Take a DataFrame and evaluate the expressions. Implement this for Column, lt, eq, etc
A type that implements this transforms a LogicalPlan to a physical plan.
A PolarsIterator
is an iterator over a ChunkedArray
which contains polars types. A PolarsIterator
must implement ExactSizeIterator
and DoubleEndedIterator
.
Values need to implement this so that they can be stored into a Series and DataFrame
Any type that is not nested
repeat_by
Repeat the values n
times.
A wrapper trait for any binary closure Fn(Series, Series) -> Result<Series>
A wrapper trait for any closure Fn(Vec<Series>) -> Result<Series>
concat_str
Concat the values into a string array.
Random access
Functions
Selects all columns
Evaluate all the expressions with a bitwise and
Evaluate all the expressions with a bitwise or
Apply a function/closure over the groups of multiple columns. This should only be used in a groupby aggregation.
arange
Create list entries that are range arrays
Find the indexes that would sort these series in order of appearance.
That means that the first Series
will be used to determine the ordering
until duplicates are found. Once duplicates are found, the next Series
will
be used and so on.
Take several expressions and collect them into a StructChunked
.
Find the mean of all the values in this Expression.
Create a Column Expression based on a column name.
Collect all LazyFrame
computations.
Select multiple columns by name
Concat multiple
list
Concat lists entries.
concat_str
Horizontally concat string columns in linear time
Count expression
Compute the covariance between two columns.
Create a DatetimeChunked
from a given start
and stop
date and a given every
interval.
Select multiple columns by dtype.
Select multiple columns by dtype.
First column in DataFrame
Accumulate over multiple columns horizontally / row wise.
Different from groupby_windows
, where define window buckets and search which values fit that
pre-defined bucket, this function defines every window based on the:
- timestamp (lower bound)
- timestamp + period (upper bound)
where timestamps are the individual values in the array time
Based on the given Window
, which has an
IsNotNull expression.
Last column in DataFrame
Create a Literal Expression from L
Apply a closure on the two columns that are evaluated from Expr
a and Expr
b.
Apply a function/closure over multiple columns once the logical plan get executed.
Apply a function/closure over multiple columns once the logical plan get executed.
Find the maximum of all the values in this Expression.
Get the the maximum value per row
Find the mean of all the values in this Expression.
Find the median of all the values in this Expression.
Find the minimum of all the values in this Expression.
Get the the minimum value per row
Compute the pearson correlation between two columns.
Find a specific quantile of all the values in this Expression.
Create a range literal.
Compute the spearman rank correlation between two columns.
Sum all the values in this Expression.
Get the the sum of the values per row
Start a when-then-otherwise expression
Type Definitions
AllowedOptimizations
Typedef for a std::result::Result
of an ArrowError
.
Dummy type, we need to instantiate all generic types, so we fill one with a dummy.
Every group is indicated by an array where the
The type used by polars to index data.