Table
check_col_conflicts ¶
Check if 2 columns A and B has non-empty content in the same row (to be used with merge_cols)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_a
|
List[str]
|
column A (list of str) |
required |
col_b
|
List[str]
|
column B (list of str) |
required |
thres
|
float
|
percentage of overlapping allowed |
0.15
|
Returns: if number of overlapping greater than threshold
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
merge_cols ¶
Merge column A and B if they do not have conflict rows
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_a
|
List[str]
|
column A (list of str) |
required |
col_b
|
List[str]
|
column B (list of str) |
required |
Returns: merged column
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
add_index_col ¶
Add index column as the first column of the table csv_rows
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_rows
|
List[List[str]]
|
input table |
required |
Returns: output table with index column
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
compress_csv ¶
Compress table csv_rows by merging sparse columns (merge_cols)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_rows
|
List[List[str]]
|
input table |
required |
Returns: output: compressed table
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
get_table_from_ocr ¶
Get list of text lines belong to table regions specified by table_list
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ocr_list
|
List[dict]
|
list of OCR output in Casia format (Flax) |
required |
table_list
|
List[dict]
|
list of table output in Casia format (Flax) |
required |
Returns:
Name | Type | Description |
---|---|---|
_type_ |
description |
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
make_markdown_table ¶
Convert table rows in list format to markdown string
Parameters:
Name | Type | Description | Default | ||
---|---|---|---|---|---|
Example
|
Input
|
|
required |
Returns: String to put into a .md file
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
parse_csv_string_to_list ¶
Convert CSV string to list of rows
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_str
|
str
|
input CSV string |
required |
Returns:
Type | Description |
---|---|
List[List[str]]
|
Output table in list format |
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
format_cell ¶
Format cell content by remove redundant character and enforce length limit
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cell
|
str
|
input cell text |
required |
length_limit
|
Optional[int]
|
limit of text length. |
None
|
Returns:
Type | Description |
---|---|
str
|
new cell text |
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
extract_tables_from_csv_string ¶
Extract list of table from FullOCR output (csv_content) with the specified table_texts
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_content
|
str
|
CSV output from FullOCR pipeline |
required |
table_texts
|
List[List[str]]
|
list of table texts extracted |
required |
Returns:
Type | Description |
---|---|
Tuple[List[str], str]
|
List of tables and non-text content |
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
parse_markdown_text_to_tables ¶
Convert markdown text to list of non-table spans and table spans
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
input markdown text |
required |
Returns:
Type | Description |
---|---|
Tuple[List[str], List[str]]
|
list of table spans and non-table spans |
Source code in libs/kotaemon/kotaemon/loaders/utils/table.py
table_cells_to_markdown ¶
Convert list of cells with attached text to Markdown table