app package

Subpackages

Submodules

app.app module

Main application module for ProteoGyver.

This module initializes and configures the Dash application with Celery for long callbacks, sets up logging, creates the navigation bar, and defines the main layout structure.

app.app.celery_app

Celery application instance for handling long callbacks

Type:: Celery

app.app.app

Main Dash application instance

Type:: Dash

app.app.server

Flask server instance from Dash app

Type:: Flask

app.app.logger

Application logger instance

Type:: Logger

app.app.create_navbar(parameters)[source]

Create the application navigation bar.

Parameters:: parameters (dict) – App parameters containing navbar configuration.
Return type:: Navbar
Returns:: Bootstrap Navbar with pages and branding.

app.app.main()[source]

Run the Dash application (main entry point).

Return type:: None

app.app.toggle_navbar_collapse(n, is_open)[source]

Toggle the navbar collapse state.

Parameters:

n (int) – Number of clicks on the toggle button.
is_open (bool) – Current collapse state.

Return type:

bool

Returns:

New collapse state.

app.database_admin module

Administrative entrypoints and helpers for database lifecycle operations.

Tasks include schema creation, snapshotting, external data updates, and periodic cleanup of old versions.

app.database_admin.clean_database(versions_to_keep_dict)[source]

Remove old database directories, keeping a configured number of versions.

Parameters:: versions_to_keep_dict – Mapping with keys ‘<name>’ (keep count), ‘<name>_path’ (path list), and ‘<name>_regex’ (folder regex with group 1 sortable for recency).
Return type:: None
Returns:: None.

app.database_admin.create_sqlite_from_schema(schema_file, db_file, overwrite=False, pragmas=('foreign_keys=ON', 'journal_mode=WAL'))[source]

Create a SQLite database from a .sql schema file.

Parameters:

schema_file (str | Path) – Path to the schema file.
db_file (str | Path) – Path of the database to create.
overwrite (bool) – Whether to overwrite an existing DB file.
pragmas (Optional[Iterable[str]]) – PRAGMAs to apply after connecting (e.g., (“foreign_keys=ON”,)).

Return type:

Path

Returns:

Absolute path to the created database.

Raises:

FileNotFoundError – If schema_file does not exist.
FileExistsError – If db_file exists and overwrite is False.
sqlite3.Error – If executing the schema fails.

app.database_admin.get_external_versions(conn, externals)[source]

Get the versions of the external databases.

Return type:: dict

app.database_admin.last_update(conn, uptype, interval, time_format)[source]

Return the last update time for a given update type or a default.

If the log lookup fails, defaults to now minus interval seconds.

Parameters:

conn (Connection) – SQLite database connection.
uptype (str) – Update type label to query (e.g., ‘external’).
interval (int) – Interval in seconds to compute a safe default.
time_format (str) – Timestamp format string used in the log table.

Return type:

datetime

Returns:

Datetime of the last update or a computed default.

app.database_admin.main()[source]

Main entry point for database administration.

Returns:: None.

app.database_updater module

Utilities to update and synchronize the SQLite database from TSV inputs and external APIs (UniProt, IntAct, BioGRID).

This module provides helpers for: - Creating TSV-based inserts/updates with schema reconciliation - Merging interaction datasets and exporting incremental updates - Recording update logs and packaging outputs

app.database_updater.get_dataframe_differences(df1, df2, ignore_columns=None)[source]

Compare two DataFrames and return modified/new indices and missing indices.

Columns listed in ignore_columns are dropped before comparison. The two DataFrames must have identical columns after dropping.

Parameters:

df1 (DataFrame) – Baseline DataFrame.
df2 (DataFrame) – New DataFrame to compare against baseline.
ignore_columns (list[str] | None) – Columns to ignore during comparison.

Return type:

tuple[list[str], list[str]]

Returns:

Tuple of (new_or_modified_indices, missing_indices).

app.database_updater.handle_merg_chunk(existing, organisms, timestamp, L, last_update_date, odir, parameters)[source]

Merge IntAct and BioGRID chunks, writing new and modified interactions.

Parameters:

existing (DataFrame) – Existing interactions DataFrame (index ‘interaction’).
organisms (set | None) – Optional set of organism IDs to include.
timestamp (str) – Current update timestamp string.
L (str) – Chunk prefix letter.
last_update_date (datetime | None) – Optional cutoff date for remote queries.
odir (str) – Output directory for TSVs.
parameters (dict) – Updater parameters including ‘Ignore diffs’ and paths.

Return type:

None

Returns:

None.

app.database_updater.handle_mods(check_for_mods, existing, timestamp, L, parameters, odir)[source]

Write modified interactions to TSV and optionally queue deletions.

Parameters:

check_for_mods – List of candidate modified interaction dicts.
existing – Existing interactions DataFrame (indexed by ‘interaction’).
timestamp – Current update timestamp string.
L – Chunk prefix/letter for file naming.
parameters – Updater parameters including deletion settings.
odir – Output directory.

Return type:

None

Returns:

None.

app.database_updater.handle_new(new_interactions, odir, timestamp, L)[source]

Write new interactions to a timestamped TSV file in the output directory.

Parameters:

new_interactions – List of interaction dicts keyed by ‘interaction’.
odir – Output directory.
timestamp – Current update timestamp string.
L – Chunk prefix/letter for file naming.

Return type:

None

Returns:

None.

app.database_updater.merge_multiple_string_dataframes(dfs)[source]

Merge DataFrames with semicolon-separated string fields by union of values.

Each cell is split on ‘;’ and de-duplicated across dataframes; merged rows are indexed by ‘interaction’.

Parameters:: dfs (list[DataFrame]) – List of input DataFrames.
Return type:: DataFrame
Returns:: Merged DataFrame with unioned semicolon-joined values.

app.database_updater.stream_flattened_rows(df)[source]

Yield rows as dictionaries with set-unioned values split on ‘;’.

Parameters:: df (DataFrame) – Input DataFrame; uses ‘interaction’ as index when present.
Return type:: Iterator[dict]
Returns:: Iterator of dict rows with ‘interaction’ key and set values per column.

app.database_updater.update_database(conn, parameters, cc_cols, cc_types, timestamp)[source]

Update multiple database tables using TSV files from configured directories.

Parameters:

conn – SQLite database connection.
parameters – Parameters with ‘Update files’ table→directory mappings and limits.
cc_cols – Expected column names for creating fresh tables.
cc_types – SQL column types aligned with cc_cols.
timestamp – Current update timestamp string.

Returns:

Tuple of (inmod_names, inmod_vals) listing counts per table and action.

app.database_updater.update_external_data(conn, parameters, timestamp, organisms=None, last_update_date=None, versions=None, ncpu=1)[source]

Update external data tables (UniProt and known interactions).

Parameters:

conn – SQLite database connection.
parameters – Updater parameters; includes external update intervals.
timestamp – Current update timestamp string.
organisms (set | None) – Optional set of organism IDs to update.
last_update_date (datetime | None) – Cutoff date; ignore data older than this.
ncpu (int) – Number of CPUs to use for merging chunks.

Returns:

None.

app.database_updater.update_knowns(conn, parameters, timestamp, uniprots, organisms, versions, last_update_date=None, ncpu=1)[source]

Update known interaction TSVs in parallel by merging external sources.

Parameters:

conn – SQLite connection for reading existing interactions.
parameters – Updater parameters with file paths.
timestamp – Current update timestamp string.
uniprots – Set of UniProt IDs to filter by.
organisms – Optional set of organism IDs to include.
versions (dict[str, list[str] | str]) – Dictionary of versions for each external source.
last_update_date (datetime | None) – Cutoff datetime; ignore older remote entries.
ncpu (int) – Number of worker processes.

Return type:

list[tuple[str, str]]

Returns:

List of new versions for each external source.

app.database_updater.update_log_table(conn, inmod_names, inmod_vals, timestamp, uptype)[source]

Record database update info in a log table.

Parameters:

conn – SQLite database connection.
inmod_names – Names like ‘table action’ (e.g., ‘proteins insertions’).
inmod_vals – Counts aligned with inmod_names.
timestamp – Update timestamp string.
uptype (str) – Update category label (e.g., ‘external’, ‘snapshot’).

Return type:

None

Returns:

None.

app.database_updater.update_table_with_file(cursor, table_name, file_path, parameters, timestamp, add_info='')[source]

Update a table with data from a TSV file, adding columns if needed.

Parameters:

cursor – SQLite database cursor.
table_name – Target table name.
file_path – Path to the TSV file with new data.
parameters – Configuration parameters; expects keys like ‘Allowed new columns’, ‘Allowed missing columns’, ‘Ignore diffs’.
timestamp – Current update timestamp string.
add_info (str) – Optional progress info for logging.

Returns:

Tuple of (insertions, modifications).

Raises:

ValueError – If too many new or missing columns are detected.

app.database_updater.update_uniprot(conn, parameters, timestamp, versions, organisms=None)[source]

Download and stage UniProt data, writing TSV updates if differences found.

Parameters:

conn – SQLite database connection.
parameters – Updater parameters (paths, ignore diffs, deletion policy).
timestamp – Current update timestamp string.
versions (list) – List of current versions of the UniProt database.
organisms (set | None) – Optional set of organism IDs to include.

Returns:

Set of UniProt IDs present in the fetched dataset.

app.database_updater.update_version_table(conn, dataset, timestamp, new_versions)[source]

Update the version table with the new versions.

Return type:: None

app.element_styles module

Styles for Dash interface elements.

Defines style dictionaries used throughout the UI, including sidebar, content area, upload components, and status indicators.

app.embedded_page_updater module

Module for creating embedded page files from a list of websites.

This module reads a text file containing website names and URLs, then generates Dash pages that embed these websites using html.Embed.

Limitations:

Not all sites can be embedded due to content security policies (CSP)
HTTPS support is untested but may help with embedding restrictions
Successfully tested only with:
- Sites served from the same server
- www.proteomics.fi
All testing has been done without HTTPS

app.embedded_page_updater.create_page_file(output_dir, site_name, url)[source]

Create a new Dash page file for embedding a website.

Parameters:

output_dir (str) – Directory where the page file should be created.
site_name (str) – Name of the website (used for the page title and route).
url (str) – URL of the website to embed.

Return type:

None

Returns:

None.

app.embedded_page_updater.parse_embed_file(filename)[source]

Parse the embed configuration file containing website names and URLs.

Parameters:: filename (str) – Path to the text file containing site information.
Return type:: List[Tuple[str, str]]
Returns:: List of (site_name, url) tuples.

app.embedded_page_updater.update_pages(output_dir, embed_file)[source]

Update embedded pages based on the configuration file.

Parameters:

output_dir (str) – Directory where page files should be created.
embed_file (str) – Path to the text file containing site information.

Return type:

None

Returns:

None.

app.run_as_pipeline module

ProteoGyver Batch Pipeline

This script runs the complete batch pipeline using the same infrastructure as the GUI, ensuring identical behavior and maintainability.

app.run_as_pipeline.main()[source]

Command line interface for the batch pipeline.

Returns:: None.

app.run_as_pipeline.run_batch_pipeline(toml_path)[source]

Run the complete batch pipeline.

Parameters:: toml_path (str) – Path to the TOML configuration file.
Return type:: dict
Returns:: Summary dict with execution details, export paths, and figures info.

Module contents

ProteoGyver - A web-based platform for proteomics and interactomics data analysis.