algocomponents.adapters

The adapters are what tasks use to connect to database providers.

All adapters inherit from the base class SQLAdapter. All methods you use to interact with an adapter are defined there, like connect(), disconnect(), run_sql_string() etc.

Most other adapters have helper methods or similar they define, but understanding these are only necessary to understand the internal workings of that adapter. If you are looking to use adapters, reading the documentation for SQLAdapter is all you need.

Submodules

algocomponents.adapters.custom_exceptions

exception algocomponents.adapters.custom_exceptions.AdapterException

Bases: Exception

exception algocomponents.adapters.custom_exceptions.ColumnMissingException

Bases: AdapterException

exception algocomponents.adapters.custom_exceptions.DataMismatchException

Bases: AdapterException

exception algocomponents.adapters.custom_exceptions.DatabaseMissingException

Bases: AdapterException

exception algocomponents.adapters.custom_exceptions.TableAlreadyExistsException

Bases: AdapterException

exception algocomponents.adapters.custom_exceptions.TableIsEmptyException

Bases: AdapterException

exception algocomponents.adapters.custom_exceptions.TableMissingException

Bases: AdapterException

Contents

class algocomponents.adapters.BigQueryAdapter(**kwargs)

Bases: SQLAdapter

Used to run queries on BigQuery.

The adapter expects that the user is authenticated in the affected gcp project using googles python client libraries and setup instructions.

Also reads the config file biq_query_adapter_config.ini in global and local config folders, if present.

adapter_specific_filters(sql: str) str

Filters to apply to a query when finding tables or CTE:s inside it.

This filter removes EXTRACT, UNNEST and ML-methods from the sql.

Returns:

The filtered sql.

connect()

Connects the adapter.

This method imports the bigquery dependencies: Therefore, bigquery is not a required the installation unless this adapter is used.

The connection is stored in self.client.

count_rows_in_table(*args, **kwargs)
disconnect(*args, **kwargs)

Checks whether the adapter is connected.

Non-abstract adapters extend this method using super().disconnect().

get_table_columns(*args, **kwargs)
insert_pandas_df_into_table(*args, **kwargs)
is_connected() bool

Checks whether the adapter is connected.

As long as there is a self.client, the adapter is considered connected.

Returns:

True if the adapter is connected, False otherwise.

latest_query_as_csv(path: str)

Get the result of the latest query as a csv file.

Parameters:

path – The path to save the csv file to.

latest_query_as_pandas() DataFrame

Get the result of the latest query as a pandas dataframe.

Returns:

A pandas dataframe of the latest query_job.

pandas_df_as_table(*args, **kwargs)
pandas_df_helper_method(*args, **kwargs)
remove_extract_method_calls_from_sql(sql: str) str

Removes EXTRACT method-calls from the sql.

Parameters:

sql – The sql to remove from.

Returns:

The sql without EXTRACT method-calls.

remove_ml_methods_from_sql(sql: str) str

Removes ML method-calls from the sql.

Parameters:

sql – The sql to remove from.

Returns:

The sql without ML method-calls.

remove_unnest_method_calls_from_sql(sql: str) str

Removes UNNEST method-calls from the sql.

Parameters:

sql – The sql to remove from.

Returns:

The sql without UNNEST method-calls.

table_exists(*args, **kwargs)

Checks whether a table exists.

Non-abstract adapters overwrite this method.

Parameters:

table – The table to look for.

class algocomponents.adapters.DatabricksAdapter(**kwargs)

Bases: SparkAdapter

Used to run queries in databricks notebooks.

Databricks notebooks use spark, but the notebook itself keeps track of a spark context and stops it when necessary. Therefore, this adapter works the exact same was as the SparkAdapter, except that it does not stop the spark context when disconnecting.

disconnect()

Overwritten to do nothing.

class algocomponents.adapters.LocalSqliteAdapter(commit_queries: bool = True, **kwargs)

Bases: SQLAdapter

Used to run queries locally.

This adapter exists to be able to run queries locally for development purposes. Such as mocking a number of queries, testing out a pipeline structure, etc.

Also reads the config file local_sqlite_adapter_config.ini in global and local config folders, if present.

Parameters:

commit_queries – Whether tables created should remain once disconnected.

connect()

Connects the adapter.

The connection is stored in self.cursor.

count_rows_in_table(*args, **kwargs)
db_file = 'local_sqlite.db'
disconnect()

Disconnects the adapter.

This is done by closing the connection.

get_table_columns(*args, **kwargs)
insert_pandas_df_into_table(*args, **kwargs)
is_connected() bool

Checks whether the adapter is connected.

Checks the existence and connection of self.cursor.

Returns:

True if the adapter is connected, False otherwise.

latest_query_as_csv(path: str)

Get the result of the latest query as a csv file.

Parameters:

path – The path to save the csv file to.

latest_query_as_pandas() DataFrame

Get the result of the latest query as a pandas dataframe.

Returns:

A pandas dataframe of the latest query run.

pandas_df_as_table(*args, **kwargs)
table_exists(*args, **kwargs)

Checks whether a table exists.

Non-abstract adapters overwrite this method.

Parameters:

table – The table to look for.

class algocomponents.adapters.SQLAdapter(adapter_config_file: str = None, **kwargs)

Bases: ConfigReader

An abstract adapter used for connecting to a service and running queries.

The purpose of the sql adapter is to generalize how connections are set up to different databases. Each type of database used in a project should have it’s own adapter.

SQLAdapters also have their own config files. They are not necessary, but if present they take priority over other config files.

adapter_specific_filters(sql: str) str

Filters to apply to a query when finding tables or CTE:s inside it.

Adapters may overwrite this method if they have any filters to apply.

Parameters:

sql – The query an adapter may remove parts of.

Returns:

The sql ones said parts are removed.

abstract connect()

Connects the adapter.

Non-abstract adapters extend this method using super().connect().

count_rows_in_table(*args, **kwargs)
default_max_rows_displayed = 20
abstract disconnect()

Checks whether the adapter is connected.

Non-abstract adapters extend this method using super().disconnect().

find_possible_cte_names(sql: str) List[str]

Uses regex to find possible CTE-names in a query.

This method will always find all CTE:s, but in some queries it mistake things that are not CTE:s for CTE:s and return these as well. It will never return a table name.

Parameters:

sql – The query to find CTE:s in.

Returns:

A list of all possible cte names.

find_table_names(sql: str, ignore_ctes: bool = True) List[str]

Uses regex to find all table names in a query.

Parameters:
  • sql – The query to find tabla names in.

  • ignore_ctes – Whether CTE:s should be ignored or not, defaults to True.

Returns:

A list of all table names in the sql.

get_formatted_queries(sql_string: str, format_variables: Dict[str, str] = None) List[str]
get_table_columns(*args, **kwargs)
insert_pandas_df_into_table(*args, **kwargs)
abstract is_connected() bool

Checks whether the adapter is connected.

Non-abstract adapters overwrite this method.

Returns:

True if the adapter is connected, False otherwise.

abstract latest_query_as_csv(path: str)

Get the result of the latest query as a csv file.

Non-abstract adapters overwrite this method.

Parameters:

path – The path to save the csv file to.

abstract latest_query_as_pandas() DataFrame

Get the result of the latest query as a pandas dataframe.

Non-abstract adapters overwrite this method.

This method is intended for use in method cascading.

Returns:

The result of the latest query as a pandas dataframe.

pandas_df_as_table(*args, **kwargs)
remove_comments_from_sql(sql: str) str

Remove comments from a query.

Parameters:

sql – The query to remove comments from.

Returns:

The sql string without comments.

run_sql_file(*args, **kwargs)
run_sql_string(*args, **kwargs)
table_as_pandas_df(*args, **kwargs)
table_contains_columns(*args, **kwargs)
abstract table_exists(table: str) bool

Checks whether a table exists.

Non-abstract adapters overwrite this method.

Parameters:

table – The table to look for.

table_is_empty(*args, **kwargs)
class algocomponents.adapters.SparkAdapter(**kwargs)

Bases: SQLAdapter

Used to run queries in spark.

Using this in notebooks might not work as some notebooks set up a spark variable automatically using SparkSession, which this adapter may interfere with. In case it does not work, consider creating a new class that overwrites the connect(), is_connected() and disconnect()-methods.

Also reads the config file spark_adapter_config.ini in global and local config folders, if present.

connect()

Connects the adapter.

This method imports the pyspark dependencies: Therefore, pyspark is not a required the installation unless this adapter is used.

The connection is stored in self.spark.

count_rows_in_table(*args, **kwargs)
disconnect(*args, **kwargs)

Checks whether the adapter is connected.

Non-abstract adapters extend this method using super().disconnect().

get_table_columns(*args, **kwargs)
insert_pandas_df_into_table(*args, **kwargs)
is_connected() bool

Checks whether the adapter is connected.

As long as self.spark is not None, the adapter is considered connected.

Returns:

True if self.spark is not None, False otherwise.

latest_query_as_csv(path: str)

Get the result of the latest query as a csv file.

Raises NotImplementedError().

Parameters:

path – The path to save the csv file to.

latest_query_as_pandas()

Get the result of the latest query as a pandas dataframe.

Raises NotImplementedError().

pandas_df_as_table(*args, **kwargs)
table_exists(*args, **kwargs)

Checks whether a table exists.

Non-abstract adapters overwrite this method.

Parameters:

table – The table to look for.