utils
sc_crawler.utils
#
jsoned_hash
#
Hash the JSON-dump of all positional and keyword arguments.
Examples:
>>> jsoned_hash(42)
'0211c62419aece235ba19582d3cf7fd8e25f837c'
>>> jsoned_hash(everything=42)
'8f8a7fcade8cb632b856f46fc64c1725ee387617'
>>> jsoned_hash(42, 42, everything=42)
'f04a77f000d85929b13de04b436c60a1272dfbf5'
Source code in sc_crawler/utils.py
hash_database
#
hash_database(connection_string, level=HashLevels.DATABASE, ignored=['observed_at'], progress=None, exclude_tables=[])
Hash the content of a database.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
connection_string |
str
|
SQLAlchemy connection string to connect to the database. |
required |
level |
HashLevels
|
The level at which to apply hashing. Possible values are 'DATABASE' (default), 'TABLE', or 'ROW'. |
DATABASE
|
ignored |
List[str]
|
List of column names to be ignored during hashing. |
['observed_at']
|
progress |
Optional[Progress]
|
Optional progress bar to track the status of the hashing. |
None
|
exclude_tables |
List[ScModel]
|
Optional list of tables not to be hashed. |
[]
|
Returns:
Type | Description |
---|---|
Union[str, dict]
|
A single SHA1 hash or dict of hashes, depending on the level. |
Source code in sc_crawler/utils.py
chunk_list
#
Split a list into chunks of a specified size.
Examples:
Source code in sc_crawler/utils.py
scmodels_to_dict
#
Creates a dict indexed by key(s) of the ScModels of the list.
When multiple keys are provided, each ScModel instance will be stored in the dict with all keys. If a key is a list, then each list element is considered (not recursively, only at first level) as a key. Conflict of keys is not checked.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
scmodels |
List[ScModel]
|
list of ScModel instances |
required |
keys |
List[str]
|
a list of strings referring to ScModel fields to be used as keys |
required |
Examples:
>>> from sc_crawler.vendors import aws
>>> scmodels_to_dict([aws], keys=["vendor_id", "name"])
{'aws': Vendor...
Source code in sc_crawler/utils.py
is_sqlite
#
is_postgresql
#
Checks if a SQLModel session is binded to a PostgreSQL-like database.
Dialect name is checked for PostgreSQL or CockroachDB.
float_inf_to_str
#
table_name_to_model
#
get_row_by_pk
#
Get a row from a table definition by primary keys.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
session |
Session
|
Connection for database connections. |
required |
model |
ScModel
|
An ScModel schema definition with table reference. |
required |
pks |
dict
|
Dictionary of all the primary keys for the row,. |
required |
Returns:
Type | Description |
---|---|
ScModel
|
ScModel object read from the database. |
Source code in sc_crawler/utils.py
nesteddefaultdict
#
list_search
#
Search for a dict in a list with the given key/value pair.
When multiple values are provided, it will use the first field with a matching name with either keys.