algocomponents
Subpackages
- algocomponents.adapters
- Submodules
- algocomponents.adapters.custom_exceptions
- Contents
BigQueryAdapterBigQueryAdapter.adapter_specific_filters()BigQueryAdapter.connect()BigQueryAdapter.count_rows_in_table()BigQueryAdapter.disconnect()BigQueryAdapter.get_table_columns()BigQueryAdapter.insert_pandas_df_into_table()BigQueryAdapter.is_connected()BigQueryAdapter.latest_query_as_csv()BigQueryAdapter.latest_query_as_pandas()BigQueryAdapter.pandas_df_as_table()BigQueryAdapter.pandas_df_helper_method()BigQueryAdapter.remove_extract_method_calls_from_sql()BigQueryAdapter.remove_ml_methods_from_sql()BigQueryAdapter.remove_unnest_method_calls_from_sql()BigQueryAdapter.table_exists()
DatabricksAdapterLocalSqliteAdapterLocalSqliteAdapter.connect()LocalSqliteAdapter.count_rows_in_table()LocalSqliteAdapter.db_fileLocalSqliteAdapter.disconnect()LocalSqliteAdapter.get_table_columns()LocalSqliteAdapter.insert_pandas_df_into_table()LocalSqliteAdapter.is_connected()LocalSqliteAdapter.latest_query_as_csv()LocalSqliteAdapter.latest_query_as_pandas()LocalSqliteAdapter.pandas_df_as_table()LocalSqliteAdapter.table_exists()
SQLAdapterSQLAdapter.adapter_specific_filters()SQLAdapter.connect()SQLAdapter.count_rows_in_table()SQLAdapter.default_max_rows_displayedSQLAdapter.disconnect()SQLAdapter.find_possible_cte_names()SQLAdapter.find_table_names()SQLAdapter.get_formatted_queries()SQLAdapter.get_table_columns()SQLAdapter.insert_pandas_df_into_table()SQLAdapter.is_connected()SQLAdapter.latest_query_as_csv()SQLAdapter.latest_query_as_pandas()SQLAdapter.pandas_df_as_table()SQLAdapter.remove_comments_from_sql()SQLAdapter.run_sql_file()SQLAdapter.run_sql_string()SQLAdapter.table_as_pandas_df()SQLAdapter.table_contains_columns()SQLAdapter.table_exists()SQLAdapter.table_is_empty()
SparkAdapterSparkAdapter.connect()SparkAdapter.count_rows_in_table()SparkAdapter.disconnect()SparkAdapter.get_table_columns()SparkAdapter.insert_pandas_df_into_table()SparkAdapter.is_connected()SparkAdapter.latest_query_as_csv()SparkAdapter.latest_query_as_pandas()SparkAdapter.pandas_df_as_table()SparkAdapter.table_exists()
- algocomponents.tasks
- Subpackages
- algocomponents.tasks.check_significance
- algocomponents.tasks.downsample_table
- algocomponents.tasks.evaluate_prediction
- algocomponents.tasks.model_trainers
- Submodules
- algocomponents.tasks.model_trainers.model_trainer
- algocomponents.tasks.model_trainers.gradient_boosting_classifier_trainer
- algocomponents.tasks.model_trainers.linear_regression_trainer
- algocomponents.tasks.model_trainers.random_forest_classifier_trainer
- algocomponents.tasks.model_trainers.xgboost_classifier_trainer
- Contents
- algocomponents.tasks.stratify_groups
- algocomponents.tasks.task_verifier
- Contents
- Subpackages
- algocomponents.utils
Submodules
algocomponents.config_reader module
- class algocomponents.config_reader.ConfigReader(global_config_dir: str = 'config', local_config_dir: str = 'config', config_files: List[str] = None, config: ConfigParser = None, section: str = None)
Bases:
ABCA class used to read and manage configs.
The config files are python config.ini-files. The priority is as follows, starting with the highest priority (in terms of what overwrites what):
Adding to config via code
Passed config
Local config
Global config
The rule of thumb is “Code over files, specific beats general”.
The class also creates a unique logger which can be used by classes that inherit from this class. The log is by default written to a file in root called log.log and printed to the terminal. The log is configurable by these fields in the config:
log_to_file: True or False, decides where a log should be written to file log_level: At what level to log (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Parameters:
global_config_dir – Path from project root to global config.ini-file.
local_config_dir – Relative path to local config.ini-file.
config_files – Names of config files to read. Config from files later in the list overwrite earlier elements. Defaults to [“config.ini”]
config – A passed ConfigParser object, which overwrites any files read.
section – Which section of the ConfigParsers should be read from.
- add_to_config(key, value)
Add values to config for the current section
- Parameters:
key – Which key to add or update
value – What value to give the key
- format_string(string: str, additional_format_variables: Dict[str, str] = None, max_depth: int = 5) str
Recursively .format():s a string given a dict until it does not change.
A max depth is used, as writing a more general approach to this method involves solving self-referencing problems in the format variables. For example, {“a”: “{b}”, “b”: “{a}”} which will cause “{a}” to be formatted into “{b}”, which will format into “{a}”, etc. There are solutions, but the added code complexity was deemed to not be worth it.
- Parameters:
string – The string to format.
additional_format_variables – A dictionary used to .format() the SQL string.
max_depth – Max number of times .format() will be done.
- Returns:
The provided string, formatted.
- Raises:
RecursionError – When .format():ing more than max_depth times and the string is still changing
Examples
{output_table} -> {tmp_db}.output_table -> tmp.output_table
- update_logger()
Update the logger with new config settings.
Currently, this is code duplication, because the fetch_logger() method itself handles returning the same logger if it is called with the same settings. However, having this as a separate piece of code is necessary, and it is likely that this will eventually not be code duplication as the use cases “Setting up logging” and “Updating logging” are different.
- verify_section_is_in_config(section)
Verifies that the section exists in the config for this class
- Raises:
ValueError – If the section is not in the config