Parsers
RegexExtractor ¶
Bases: BaseComponent
Simple class for extracting text from a document using a regex pattern.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pattern
|
List[str]
|
The regex pattern(s) to use. |
required |
output_map
|
dict
|
A mapping from extracted text to the desired output. Defaults to None. |
required |
Source code in libs/kotaemon/kotaemon/parsers/regex_extractor.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
run_raw_static
staticmethod
¶
Finds all non-overlapping occurrences of a pattern in a string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pattern
|
str
|
The regular expression pattern to search for. |
required |
text
|
str
|
The input string to search in. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
List[str]: A list of all non-overlapping occurrences of the pattern in the string. |
Source code in libs/kotaemon/kotaemon/parsers/regex_extractor.py
map_output
staticmethod
¶
Maps the given text
to its corresponding value in the output_map
dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The input text to be mapped. |
required |
output_map
|
dict
|
A dictionary containing mapping of input text to output values. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The corresponding value from the |
Source code in libs/kotaemon/kotaemon/parsers/regex_extractor.py
run_raw ¶
Matches the raw text against the pattern and rans the output mapping, returning an instance of ExtractorOutput.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The raw text to be processed. |
required |
Returns:
Name | Type | Description |
---|---|---|
ExtractorOutput |
ExtractorOutput
|
The processed output as a list of ExtractorOutput. |
Source code in libs/kotaemon/kotaemon/parsers/regex_extractor.py
run ¶
Match the input against a pattern and return the output for each input
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str | list[str] | Document | list[Document]
|
contains the input string to be processed |
required |
Returns:
Type | Description |
---|---|
list[ExtractorOutput]
|
A list contains the output ExtractorOutput for each input |