API Reference

intake_duckdb.DuckDBSource(*args, **kwargs)

DuckDB table to dataframe reader.

intake_duckdb.DuckDBCatalog(*args, **kwargs)

Makes data source catalog out of known tables in the given DuckDB instance.

intake_duckdb.DuckDBTransform(*args, **kwargs)

Run a DuckDB query on any Intake source for which .read() produces a pandas DataFrame.

class intake_duckdb.DuckDBSource(*args, **kwargs)

DuckDB table to dataframe reader. Can take either a table name or a SQL expression. Partitionable.

Caches entire dataframe in memory.

Parameters:
uri: str or None

Path to local duckdb file

sql_expr: str or None

Query expression to pass to the DB backend

connection: duckdb.DuckDBPyConnection or None

Existing connection to DB backend

table: str or None

Table name

chunks: int or None

Number of partitions, default is 1

metadata: dict

Additional metadata to pass to parent class

read()

Load entire dataset into a container and return it

class intake_duckdb.DuckDBCatalog(*args, **kwargs)

Makes data source catalog out of known tables in the given DuckDB instance.

Parameters:
uri: str or None

Path to local duckdb file

class intake_duckdb.DuckDBTransform(*args, **kwargs)

Run a DuckDB query on any Intake source for which .read() produces a pandas DataFrame. Can specify multiple targets within the same catalog. Reads entire source of each target into memory.

Parameters:
sql_expr: str

Query expression to pass to the DB backend

targets: list[intake.DataSource] or list[str]

List of Intake data sources or named sources within catalog

chunks: int or None

Number of partitions, default is 1

metadata: dict

Additional metadata to pass to parent class