MSParser Package

The MSParser package provides utilities for pre-analysis of MS raw data files, including parsers for Thermo and timsTOF instruments.

Submodules

MSParser.ms_name_identifier module

utils.MSParser.ms_name_identifier.identify(raw_file_path)[source]

Return type:: str

MSParser.MSparser module

utils.MSParser.MSparser.analyze(filename, outdir, errorfile)[source]

utils.MSParser.MSparser.calculate_auc(ser)[source]

utils.MSParser.MSparser.count_intercepts(xydata)[source]

utils.MSParser.MSparser.handle_data(data_dict, oname)[source]

utils.MSParser.MSparser.main()[source]

utils.MSParser.MSparser.parse_d(root, filename, run_id_regex)[source]

utils.MSParser.MSparser.parse_raw(root, filename)[source]

utils.MSParser.MSparser.read_toml(toml_file)[source]

utils.MSParser.MSparser.remove_accent_characters(text)[source]

Replace accented characters with their unaccented equivalents.

Parameters:: text (str) – Input string containing accented characters.
Return type:: str
Returns:: String with accented characters replaced by unaccented equivalents.

utils.MSParser.MSparser.remove_rawfile_ending(column_name)[source]

Removes the raw file ending from a column name. For example, if the column name is ‘run1.raw’, it will be changed to ‘run1’.

Return type:: str

utils.MSParser.MSparser.replace_special_characters(text, replacewith='.', dict_and_re=False, replacement_dict=None, stripresult=True, remove_duplicates=False, make_lowercase=True, allow_numbers=True, allow_space=False, mask_first_digit=None)[source]

Replace special characters in a string with specified replacements.

Parameters:

text (str) – Input string containing special characters.
replacewith (str) – Character to use for replacement.
dict_and_re (bool) – Whether to apply both dictionary replacements and regex.
replacement_dict (Optional[Dict[str, str]]) – Mapping of specific substrings to replacements.
stripresult (bool) – Strip whitespace and replacement characters from result.
remove_duplicates (bool) – Collapse consecutive replacement characters.
make_lowercase (bool) – Convert result to lowercase.
allow_numbers (bool) – Allow numbers in the result.
allow_space (bool) – Allow spaces in the result.
mask_first_digit (str | None) – Character to prefix when first char is a digit.

Return type:

str

Returns:

String with special characters replaced.

MSParser.parse_thermo module

utils.MSParser.parse_thermo.deduplicate_nested_dicts(nest_dict)[source]

Remove nested dicts based on inner dict values.

Parameters:: nest_dict (dict) – Dictionary of dictionaries.
Return type:: dict
Returns:: A new dictionary with duplicates removed (keeps first occurrence, by sorted keys).

utils.MSParser.parse_thermo.get_scantypes(rawfile)[source]

Get scantypes from a raw file.

Parameters:: rawfile (RawFileReaderAdapter) – Raw file reader adapter.
Return type:: dict
Returns:: Dictionary containing the scantypes.

utils.MSParser.parse_thermo.get_traces(rawfile)[source]

Get traces from a raw file.

Parameters:: rawfile (RawFileReaderAdapter) – Raw file reader adapter.
Return type:: dict
Returns:: Dictionary containing the traces.

utils.MSParser.parse_thermo.parse_file(data_path, filename)[source]

Parse a raw file.

Parameters:

data_path – Path to the data file.
filename – Name of the data file.

Return type:

dict

Returns:

Dictionary containing the parsed data.

utils.MSParser.parse_thermo.roundup(x)[source]

Round up a float to the nearest 100.

Return type:: int

MSParser.parse_timstof module

utils.MSParser.parse_timstof.get_directory_size(directory_path)[source]: Calculate the total size of all files in a directory recursively.

utils.MSParser.parse_timstof.get_traces(ms1_df, ms2_df)[source]

Get traces from a Timstof file.

Parameters:

ms1_df – DataFrame containing the MS1 data.
ms2_df – DataFrame containing the MS2 data.

Returns:

Dictionary containing the traces.

utils.MSParser.parse_timstof.parse_file(root, run_name, run_id_regex)[source]

Parse a Timstof file.

Parameters:

root – Root directory of the Timstof file.
run_name – Name of the run.
run_id_regex – Regular expression to match the run ID.

Returns:

Dictionary containing the parsed data.

Module contents

MSParser for pre-analysis of MS raw data files.