Kkit.scaling_code
This module is used to recursively traverse a file and extract the data according to the pattern loop.
It is allways used to extract the data from the code scalability testing result.
For example, the result file is:
number of threads: 1
execution time: 0.10s
number of threads: 2
execution time: 0.21s
number of threads: 3
execution time: 0.31s
The pattern loop is:
number of threads: $$
execution time: $$
Data extractor can be built as:
#test.py
from Kkit.scaling_code import Data, Section, Line, extract_info_cli
d = Data(
Section(
Line("number of threads: $$", "N_threads"),
Line("execution time: $$", "execution_time")
)
)
extract_info_cli(d)
Then run the extractor:
# use --help to see the help information
python test.py --help
# run the command
python test.py scalability_result.txt -o test.csv
The result will be saved in the test.csv file as a table:
N_threads,execution_time
1,0.10
2,0.21
3,0.31
1""" 2This module is used to recursively traverse a file and extract the data according to the pattern loop. 3 4It is allways used to extract the data from the code scalability testing result. 5 6For example, the result file is: 7 8```txt 9number of threads: 1 10execution time: 0.10s 11number of threads: 2 12execution time: 0.21s 13number of threads: 3 14execution time: 0.31s 15``` 16 17The pattern loop is: 18 19``` 20number of threads: $$ 21execution time: $$ 22``` 23 24Data extractor can be built as: 25 26```python 27#test.py 28from Kkit.scaling_code import Data, Section, Line, extract_info_cli 29 30d = Data( 31 Section( 32 Line("number of threads: $$", "N_threads"), 33 Line("execution time: $$", "execution_time") 34 ) 35) 36 37extract_info_cli(d) 38``` 39 40Then run the extractor: 41 42```python 43# use --help to see the help information 44python test.py --help 45 46# run the command 47python test.py scalability_result.txt -o test.csv 48``` 49 50The result will be saved in the test.csv file as a table: 51 52```csv 53N_threads,execution_time 541,0.10 552,0.21 563,0.31 57``` 58""" 59 60import re 61import pandas as pd 62import argparse 63import warnings 64 65 66if_warning = 0 67"""@private""" 68data_pattern = r" ?(\d+\.?\d*) ?" 69"""@private""" 70 71def match_string(pattern, string): 72 """@private""" 73 if re.match(pattern, string): 74 return True 75 else: 76 return False 77 78class Line: 79 def __init__(self, pattern: str, *labels): 80 """ 81 Initialize the Line object. 82 83 Parameters 84 ---------- 85 pattern : str 86 The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place. 87 88 *labels : str 89 The labels of the data, namely the column name of this data in table. 90 For example, the pattern is "number of threads: $$", the label can be "number of threads". 91 Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time". 92 ```python 93 Line("number of threads: $$, execution time: $$", "number of threads", "execution time") 94 ``` 95 """ 96 self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$" 97 """@private""" 98 self.labels = labels 99 """@private""" 100 101 def analyse(self, string, line_number): 102 """@private""" 103 if self.labels==tuple(): 104 return {} 105 if match_string(self.full_pattern, string) == False: 106 if if_warning == 0: 107 return {} 108 else: 109 warnings.warn("The line(%s) can't match the patter(%s), skip line %d"%(string, self.full_pattern, line_number)) 110 return {} 111 112 data = re.findall(data_pattern, string) 113 if all(map(lambda x: "." in x, data)): 114 data = [float(i) for i in data] 115 else: 116 data = [int(i) for i in data] 117 if len(data)!=len(self.labels): 118 raise Exception("The number of data(%d) is differenct from lables(%d), line %d"%(len(data), len(self.labels), line_number)) 119 return {l:d for l,d in zip(self.labels, data)} 120 121class Section: 122 def __init__(self, *Lines): 123 """ 124 Initialize the Section object. 125 126 A Section object can contain multiple Line objects. 127 128 Parameters 129 ---------- 130 *Lines : Line 131 The Line objects in this section. 132 """ 133 self.lines = Lines 134 """@private""" 135 136class Data: 137 def __init__(self, *Sections): 138 """ 139 Initialize the Data object. 140 141 A Data object can contain multiple Section objects. 142 143 Parameters 144 ---------- 145 *Sections : Section 146 The Section objects in this data. 147 """ 148 self.sections = Sections 149 """@private""" 150 def generate(self, file_path, encoding="utf-8"): 151 """@private""" 152 with open(file_path, "r", encoding=encoding) as f: 153 lines = [i.rstrip("\n") for i in f.readlines() if i.rstrip("\n")!=""] 154 result = {} 155 for s in self.sections: 156 index = 0 157 for i, line in enumerate(lines): 158 temp = s.lines[index%len(s.lines)].analyse(line, i) 159 if temp!={}: 160 index+=1 161 for j,k in temp.items(): 162 if j in result: 163 result[j].append(k) 164 else: 165 result[j] = [k] 166 return pd.DataFrame(result) 167 168def extract_info_cli(data: Data): 169 """ 170 Initialize the command line interface for the data extractor. 171 172 Parameters 173 ---------- 174 data : Data 175 The Data object to extract the data. 176 """ 177 parser = argparse.ArgumentParser(description="Process some files.") 178 179 # Add the arguments 180 parser.add_argument("input_file", type=str, help="The name of the input file.") 181 parser.add_argument("-o", "--output_file", type=str, help="The name of the output file.") 182 parser.add_argument("-e", "--encoding", default="utf-8", type=str, help="The encoding to use.") 183 184 # Parse the arguments 185 args = parser.parse_args() 186 187 data.generate(args.input_file, args.encoding).to_csv(args.output_file, index=False, encoding=args.encoding) 188 189def extract_info(data: Data, file_path: str, encoding="utf-8"): 190 """ 191 Extract the data from the file. 192 193 Recommend to use the `extract_info_cli` function, becauseof the flexibility. 194 195 Parameters 196 ---------- 197 data : Data 198 The Data object to extract the data. 199 200 file_path : str 201 The path of the file. 202 203 encoding: str 204 The encoding of the file. Default is "utf-8". 205 206 Returns 207 ------- 208 pd.DataFrame 209 The data extracted from the file. 210 """ 211 return data.generate(file_path, encoding) 212 213if __name__=="__main__": 214 d = Data( 215 Section( 216 Line("MKL number of threads: $$", "N_threads"), 217 Line("cmkl_permut total time: $$", "cfunc_time"), 218 Line("cmkl_permut permutation time: $$", "permut_time") 219 ), 220 Section( 221 Line("Function permutation_mkl took $$ seconds to run.", "python_func_time") 222 ) 223 ) 224 d.generate("./new_txt.txt").to_csv("test.csv", index=False)
79class Line: 80 def __init__(self, pattern: str, *labels): 81 """ 82 Initialize the Line object. 83 84 Parameters 85 ---------- 86 pattern : str 87 The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place. 88 89 *labels : str 90 The labels of the data, namely the column name of this data in table. 91 For example, the pattern is "number of threads: $$", the label can be "number of threads". 92 Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time". 93 ```python 94 Line("number of threads: $$, execution time: $$", "number of threads", "execution time") 95 ``` 96 """ 97 self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$" 98 """@private""" 99 self.labels = labels 100 """@private""" 101 102 def analyse(self, string, line_number): 103 """@private""" 104 if self.labels==tuple(): 105 return {} 106 if match_string(self.full_pattern, string) == False: 107 if if_warning == 0: 108 return {} 109 else: 110 warnings.warn("The line(%s) can't match the patter(%s), skip line %d"%(string, self.full_pattern, line_number)) 111 return {} 112 113 data = re.findall(data_pattern, string) 114 if all(map(lambda x: "." in x, data)): 115 data = [float(i) for i in data] 116 else: 117 data = [int(i) for i in data] 118 if len(data)!=len(self.labels): 119 raise Exception("The number of data(%d) is differenct from lables(%d), line %d"%(len(data), len(self.labels), line_number)) 120 return {l:d for l,d in zip(self.labels, data)}
80 def __init__(self, pattern: str, *labels): 81 """ 82 Initialize the Line object. 83 84 Parameters 85 ---------- 86 pattern : str 87 The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place. 88 89 *labels : str 90 The labels of the data, namely the column name of this data in table. 91 For example, the pattern is "number of threads: $$", the label can be "number of threads". 92 Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time". 93 ```python 94 Line("number of threads: $$, execution time: $$", "number of threads", "execution time") 95 ``` 96 """ 97 self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$" 98 """@private""" 99 self.labels = labels 100 """@private"""
Initialize the Line object.
Parameters
pattern : str The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place.
*labels : str The labels of the data, namely the column name of this data in table. For example, the pattern is "number of threads: $$", the label can be "number of threads". Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time".
Line("number of threads: $$, execution time: $$", "number of threads", "execution time")
122class Section: 123 def __init__(self, *Lines): 124 """ 125 Initialize the Section object. 126 127 A Section object can contain multiple Line objects. 128 129 Parameters 130 ---------- 131 *Lines : Line 132 The Line objects in this section. 133 """ 134 self.lines = Lines 135 """@private"""
123 def __init__(self, *Lines): 124 """ 125 Initialize the Section object. 126 127 A Section object can contain multiple Line objects. 128 129 Parameters 130 ---------- 131 *Lines : Line 132 The Line objects in this section. 133 """ 134 self.lines = Lines 135 """@private"""
Initialize the Section object.
A Section object can contain multiple Line objects.
Parameters
*Lines : Line The Line objects in this section.
137class Data: 138 def __init__(self, *Sections): 139 """ 140 Initialize the Data object. 141 142 A Data object can contain multiple Section objects. 143 144 Parameters 145 ---------- 146 *Sections : Section 147 The Section objects in this data. 148 """ 149 self.sections = Sections 150 """@private""" 151 def generate(self, file_path, encoding="utf-8"): 152 """@private""" 153 with open(file_path, "r", encoding=encoding) as f: 154 lines = [i.rstrip("\n") for i in f.readlines() if i.rstrip("\n")!=""] 155 result = {} 156 for s in self.sections: 157 index = 0 158 for i, line in enumerate(lines): 159 temp = s.lines[index%len(s.lines)].analyse(line, i) 160 if temp!={}: 161 index+=1 162 for j,k in temp.items(): 163 if j in result: 164 result[j].append(k) 165 else: 166 result[j] = [k] 167 return pd.DataFrame(result)
138 def __init__(self, *Sections): 139 """ 140 Initialize the Data object. 141 142 A Data object can contain multiple Section objects. 143 144 Parameters 145 ---------- 146 *Sections : Section 147 The Section objects in this data. 148 """ 149 self.sections = Sections 150 """@private"""
Initialize the Data object.
A Data object can contain multiple Section objects.
Parameters
*Sections : Section The Section objects in this data.
169def extract_info_cli(data: Data): 170 """ 171 Initialize the command line interface for the data extractor. 172 173 Parameters 174 ---------- 175 data : Data 176 The Data object to extract the data. 177 """ 178 parser = argparse.ArgumentParser(description="Process some files.") 179 180 # Add the arguments 181 parser.add_argument("input_file", type=str, help="The name of the input file.") 182 parser.add_argument("-o", "--output_file", type=str, help="The name of the output file.") 183 parser.add_argument("-e", "--encoding", default="utf-8", type=str, help="The encoding to use.") 184 185 # Parse the arguments 186 args = parser.parse_args() 187 188 data.generate(args.input_file, args.encoding).to_csv(args.output_file, index=False, encoding=args.encoding)
Initialize the command line interface for the data extractor.
Parameters
data : Data The Data object to extract the data.
190def extract_info(data: Data, file_path: str, encoding="utf-8"): 191 """ 192 Extract the data from the file. 193 194 Recommend to use the `extract_info_cli` function, becauseof the flexibility. 195 196 Parameters 197 ---------- 198 data : Data 199 The Data object to extract the data. 200 201 file_path : str 202 The path of the file. 203 204 encoding: str 205 The encoding of the file. Default is "utf-8". 206 207 Returns 208 ------- 209 pd.DataFrame 210 The data extracted from the file. 211 """ 212 return data.generate(file_path, encoding)
Extract the data from the file.
Recommend to use the extract_info_cli
function, becauseof the flexibility.
Parameters
data : Data The Data object to extract the data.
file_path : str The path of the file.
encoding: str The encoding of the file. Default is "utf-8".
Returns
pd.DataFrame The data extracted from the file.