MSParser Package

The MSParser package provides utilities for pre-analysis of MS raw data files, including parsers for Thermo and timsTOF instruments.

Submodules

MSParser.ms_name_identifier module

utils.MSParser.ms_name_identifier.identify(raw_file_path)[source]
Return type:

str

MSParser.MSparser module

utils.MSParser.MSparser.analyze(filename, outdir, errorfile)[source]
utils.MSParser.MSparser.calculate_auc(ser)[source]
utils.MSParser.MSparser.count_intercepts(xydata)[source]
utils.MSParser.MSparser.handle_data(data_dict, oname)[source]
utils.MSParser.MSparser.main()[source]
utils.MSParser.MSparser.parse_d(root, filename, run_id_regex)[source]
utils.MSParser.MSparser.parse_raw(root, filename)[source]
utils.MSParser.MSparser.read_toml(toml_file)[source]
utils.MSParser.MSparser.remove_accent_characters(text)[source]

Replace accented characters with their unaccented equivalents.

Parameters:

text (str) – Input string containing accented characters.

Return type:

str

Returns:

String with accented characters replaced by unaccented equivalents.

utils.MSParser.MSparser.remove_rawfile_ending(column_name)[source]

Removes the raw file ending from a column name. For example, if the column name is ‘run1.raw’, it will be changed to ‘run1’.

Return type:

str

utils.MSParser.MSparser.replace_special_characters(text, replacewith='.', dict_and_re=False, replacement_dict=None, stripresult=True, remove_duplicates=False, make_lowercase=True, allow_numbers=True, allow_space=False, mask_first_digit=None)[source]

Replace special characters in a string with specified replacements.

Parameters:
  • text (str) – Input string containing special characters.

  • replacewith (str) – Character to use for replacement.

  • dict_and_re (bool) – Whether to apply both dictionary replacements and regex.

  • replacement_dict (Optional[Dict[str, str]]) – Mapping of specific substrings to replacements.

  • stripresult (bool) – Strip whitespace and replacement characters from result.

  • remove_duplicates (bool) – Collapse consecutive replacement characters.

  • make_lowercase (bool) – Convert result to lowercase.

  • allow_numbers (bool) – Allow numbers in the result.

  • allow_space (bool) – Allow spaces in the result.

  • mask_first_digit (str | None) – Character to prefix when first char is a digit.

Return type:

str

Returns:

String with special characters replaced.

MSParser.parse_thermo module

utils.MSParser.parse_thermo.deduplicate_nested_dicts(nest_dict)[source]

Remove nested dicts based on inner dict values.

Parameters:

nest_dict (dict) – Dictionary of dictionaries.

Return type:

dict

Returns:

A new dictionary with duplicates removed (keeps first occurrence, by sorted keys).

utils.MSParser.parse_thermo.get_scantypes(rawfile)[source]

Get scantypes from a raw file.

Parameters:

rawfile (RawFileReaderAdapter) – Raw file reader adapter.

Return type:

dict

Returns:

Dictionary containing the scantypes.

utils.MSParser.parse_thermo.get_traces(rawfile)[source]

Get traces from a raw file.

Parameters:

rawfile (RawFileReaderAdapter) – Raw file reader adapter.

Return type:

dict

Returns:

Dictionary containing the traces.

utils.MSParser.parse_thermo.parse_file(data_path, filename)[source]

Parse a raw file.

Parameters:
  • data_path – Path to the data file.

  • filename – Name of the data file.

Return type:

dict

Returns:

Dictionary containing the parsed data.

utils.MSParser.parse_thermo.roundup(x)[source]

Round up a float to the nearest 100.

Return type:

int

MSParser.parse_timstof module

utils.MSParser.parse_timstof.get_directory_size(directory_path)[source]

Calculate the total size of all files in a directory recursively.

utils.MSParser.parse_timstof.get_traces(ms1_df, ms2_df)[source]

Get traces from a Timstof file.

Parameters:
  • ms1_df – DataFrame containing the MS1 data.

  • ms2_df – DataFrame containing the MS2 data.

Returns:

Dictionary containing the traces.

utils.MSParser.parse_timstof.parse_file(root, run_name, run_id_regex)[source]

Parse a Timstof file.

Parameters:
  • root – Root directory of the Timstof file.

  • run_name – Name of the run.

  • run_id_regex – Regular expression to match the run ID.

Returns:

Dictionary containing the parsed data.

Module contents

MSParser for pre-analysis of MS raw data files.