Kkit.scaling_code

This module is used to recursively traverse a file and extract the data according to the pattern loop.

It is allways used to extract the data from the code scalability testing result.

For example, the result file is:

number of threads: 1
execution time: 0.10s
number of threads: 2
execution time: 0.21s
number of threads: 3
execution time: 0.31s

The pattern loop is:

number of threads: $$
execution time: $$

Data extractor can be built as:

#test.py
from Kkit.scaling_code import Data, Section, Line, extract_info_cli

d = Data(
    Section(
        Line("number of threads: $$", "N_threads"),
        Line("execution time: $$", "execution_time")
    )
)

extract_info_cli(d)

Then run the extractor:

# use --help to see the help information
python test.py --help

# run the command
python test.py scalability_result.txt -o test.csv

The result will be saved in the test.csv file as a table:

N_threads,execution_time
1,0.10
2,0.21
3,0.31
  1"""
  2This module is used to recursively traverse a file and extract the data according to the pattern loop.
  3
  4It is allways used to extract the data from the code scalability testing result.
  5
  6For example, the result file is:
  7
  8```txt
  9number of threads: 1
 10execution time: 0.10s
 11number of threads: 2
 12execution time: 0.21s
 13number of threads: 3
 14execution time: 0.31s
 15```
 16
 17The pattern loop is:
 18
 19```
 20number of threads: $$
 21execution time: $$
 22```
 23
 24Data extractor can be built as:
 25    
 26```python
 27#test.py
 28from Kkit.scaling_code import Data, Section, Line, extract_info_cli
 29
 30d = Data(
 31    Section(
 32        Line("number of threads: $$", "N_threads"),
 33        Line("execution time: $$", "execution_time")
 34    )
 35)
 36
 37extract_info_cli(d)
 38```
 39
 40Then run the extractor:
 41
 42```python
 43# use --help to see the help information
 44python test.py --help
 45
 46# run the command
 47python test.py scalability_result.txt -o test.csv
 48```
 49
 50The result will be saved in the test.csv file as a table:
 51
 52```csv
 53N_threads,execution_time
 541,0.10
 552,0.21
 563,0.31
 57```
 58"""
 59
 60import re
 61import pandas as pd
 62import argparse
 63import warnings
 64
 65
 66if_warning = 0
 67"""@private"""
 68data_pattern = r" ?(\d+\.?\d*) ?"
 69"""@private"""
 70
 71def match_string(pattern, string):
 72    """@private"""
 73    if re.match(pattern, string):
 74        return True
 75    else:
 76        return False
 77
 78class Line:
 79    def __init__(self, pattern: str, *labels):
 80        """
 81        Initialize the Line object.
 82
 83        Parameters
 84        ----------
 85        pattern : str
 86            The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place.
 87
 88        *labels : str
 89            The labels of the data, namely the column name of this data in table.
 90            For example, the pattern is "number of threads: $$", the label can be "number of threads".
 91            Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time".
 92            ```python
 93            Line("number of threads: $$, execution time: $$", "number of threads", "execution time")
 94            ```
 95        """
 96        self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$"
 97        """@private"""
 98        self.labels = labels
 99        """@private"""
100
101    def analyse(self, string, line_number):
102        """@private"""
103        if self.labels==tuple():
104            return {}
105        if match_string(self.full_pattern, string) == False:
106            if if_warning == 0:
107                return {}
108            else:
109                warnings.warn("The line(%s) can't match the patter(%s), skip line %d"%(string, self.full_pattern, line_number))
110                return {}
111
112        data = re.findall(data_pattern, string)
113        if all(map(lambda x: "." in x, data)):
114            data = [float(i) for i in data]
115        else:
116            data = [int(i) for i in data]
117        if len(data)!=len(self.labels):
118            raise Exception("The number of data(%d) is differenct from lables(%d), line %d"%(len(data), len(self.labels), line_number))
119        return {l:d for l,d in zip(self.labels, data)}
120
121class Section:
122    def __init__(self, *Lines):
123        """
124        Initialize the Section object.
125
126        A Section object can contain multiple Line objects.
127
128        Parameters
129        ----------
130        *Lines : Line
131            The Line objects in this section.
132        """
133        self.lines = Lines
134        """@private"""
135    
136class Data:
137    def __init__(self, *Sections):
138        """
139        Initialize the Data object.
140
141        A Data object can contain multiple Section objects.
142
143        Parameters
144        ----------
145        *Sections : Section
146            The Section objects in this data.
147        """
148        self.sections = Sections
149        """@private"""
150    def generate(self, file_path, encoding="utf-8"):
151        """@private"""
152        with open(file_path, "r", encoding=encoding) as f:
153            lines = [i.rstrip("\n") for i in f.readlines() if i.rstrip("\n")!=""]
154        result = {}
155        for s in self.sections:
156            index = 0
157            for i, line in enumerate(lines):
158                temp = s.lines[index%len(s.lines)].analyse(line, i)
159                if temp!={}:
160                    index+=1
161                    for j,k in temp.items():
162                        if j in result:
163                            result[j].append(k)
164                        else:
165                            result[j] = [k]
166        return pd.DataFrame(result)
167    
168def extract_info_cli(data: Data):
169    """
170    Initialize the command line interface for the data extractor.
171
172    Parameters
173    ----------
174    data : Data
175        The Data object to extract the data.
176    """
177    parser = argparse.ArgumentParser(description="Process some files.")
178
179    # Add the arguments
180    parser.add_argument("input_file", type=str, help="The name of the input file.")
181    parser.add_argument("-o", "--output_file", type=str, help="The name of the output file.")
182    parser.add_argument("-e", "--encoding", default="utf-8", type=str, help="The encoding to use.")
183
184    # Parse the arguments
185    args = parser.parse_args()
186
187    data.generate(args.input_file, args.encoding).to_csv(args.output_file, index=False, encoding=args.encoding)
188
189def extract_info(data: Data, file_path: str, encoding="utf-8"):
190    """
191    Extract the data from the file.
192
193    Recommend to use the `extract_info_cli` function, becauseof the flexibility.
194
195    Parameters
196    ----------
197    data : Data
198        The Data object to extract the data.
199
200    file_path : str
201        The path of the file.
202        
203    encoding: str
204        The encoding of the file. Default is "utf-8".
205
206    Returns
207    -------
208    pd.DataFrame
209        The data extracted from the file.
210    """
211    return data.generate(file_path, encoding)
212
213if __name__=="__main__":
214    d = Data(
215        Section(
216            Line("MKL number of threads: $$", "N_threads"),
217            Line("cmkl_permut total time: $$", "cfunc_time"),
218            Line("cmkl_permut permutation time: $$", "permut_time")
219        ),
220        Section(
221            Line("Function permutation_mkl took $$ seconds to run.", "python_func_time")
222        )
223    )
224    d.generate("./new_txt.txt").to_csv("test.csv", index=False)
class Line:
 79class Line:
 80    def __init__(self, pattern: str, *labels):
 81        """
 82        Initialize the Line object.
 83
 84        Parameters
 85        ----------
 86        pattern : str
 87            The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place.
 88
 89        *labels : str
 90            The labels of the data, namely the column name of this data in table.
 91            For example, the pattern is "number of threads: $$", the label can be "number of threads".
 92            Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time".
 93            ```python
 94            Line("number of threads: $$, execution time: $$", "number of threads", "execution time")
 95            ```
 96        """
 97        self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$"
 98        """@private"""
 99        self.labels = labels
100        """@private"""
101
102    def analyse(self, string, line_number):
103        """@private"""
104        if self.labels==tuple():
105            return {}
106        if match_string(self.full_pattern, string) == False:
107            if if_warning == 0:
108                return {}
109            else:
110                warnings.warn("The line(%s) can't match the patter(%s), skip line %d"%(string, self.full_pattern, line_number))
111                return {}
112
113        data = re.findall(data_pattern, string)
114        if all(map(lambda x: "." in x, data)):
115            data = [float(i) for i in data]
116        else:
117            data = [int(i) for i in data]
118        if len(data)!=len(self.labels):
119            raise Exception("The number of data(%d) is differenct from lables(%d), line %d"%(len(data), len(self.labels), line_number))
120        return {l:d for l,d in zip(self.labels, data)}
Line(pattern: str, *labels)
 80    def __init__(self, pattern: str, *labels):
 81        """
 82        Initialize the Line object.
 83
 84        Parameters
 85        ----------
 86        pattern : str
 87            The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place.
 88
 89        *labels : str
 90            The labels of the data, namely the column name of this data in table.
 91            For example, the pattern is "number of threads: $$", the label can be "number of threads".
 92            Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time".
 93            ```python
 94            Line("number of threads: $$, execution time: $$", "number of threads", "execution time")
 95            ```
 96        """
 97        self.full_pattern = "^"+pattern.replace("$$", data_pattern)+"$"
 98        """@private"""
 99        self.labels = labels
100        """@private"""

Initialize the Line object.

Parameters

pattern : str The pattern of the line. The pattern should be like "number of threads: $$", where "$$" is the data place.

*labels : str The labels of the data, namely the column name of this data in table. For example, the pattern is "number of threads: $$", the label can be "number of threads". Can be multiple labels. Like "number of threads: $$, execution time: $$", the labels can be "number of threads" and "execution time".

Line("number of threads: $$, execution time: $$", "number of threads", "execution time")
class Section:
122class Section:
123    def __init__(self, *Lines):
124        """
125        Initialize the Section object.
126
127        A Section object can contain multiple Line objects.
128
129        Parameters
130        ----------
131        *Lines : Line
132            The Line objects in this section.
133        """
134        self.lines = Lines
135        """@private"""
Section(*Lines)
123    def __init__(self, *Lines):
124        """
125        Initialize the Section object.
126
127        A Section object can contain multiple Line objects.
128
129        Parameters
130        ----------
131        *Lines : Line
132            The Line objects in this section.
133        """
134        self.lines = Lines
135        """@private"""

Initialize the Section object.

A Section object can contain multiple Line objects.

Parameters

*Lines : Line The Line objects in this section.

class Data:
137class Data:
138    def __init__(self, *Sections):
139        """
140        Initialize the Data object.
141
142        A Data object can contain multiple Section objects.
143
144        Parameters
145        ----------
146        *Sections : Section
147            The Section objects in this data.
148        """
149        self.sections = Sections
150        """@private"""
151    def generate(self, file_path, encoding="utf-8"):
152        """@private"""
153        with open(file_path, "r", encoding=encoding) as f:
154            lines = [i.rstrip("\n") for i in f.readlines() if i.rstrip("\n")!=""]
155        result = {}
156        for s in self.sections:
157            index = 0
158            for i, line in enumerate(lines):
159                temp = s.lines[index%len(s.lines)].analyse(line, i)
160                if temp!={}:
161                    index+=1
162                    for j,k in temp.items():
163                        if j in result:
164                            result[j].append(k)
165                        else:
166                            result[j] = [k]
167        return pd.DataFrame(result)
Data(*Sections)
138    def __init__(self, *Sections):
139        """
140        Initialize the Data object.
141
142        A Data object can contain multiple Section objects.
143
144        Parameters
145        ----------
146        *Sections : Section
147            The Section objects in this data.
148        """
149        self.sections = Sections
150        """@private"""

Initialize the Data object.

A Data object can contain multiple Section objects.

Parameters

*Sections : Section The Section objects in this data.

def extract_info_cli(data: Data):
169def extract_info_cli(data: Data):
170    """
171    Initialize the command line interface for the data extractor.
172
173    Parameters
174    ----------
175    data : Data
176        The Data object to extract the data.
177    """
178    parser = argparse.ArgumentParser(description="Process some files.")
179
180    # Add the arguments
181    parser.add_argument("input_file", type=str, help="The name of the input file.")
182    parser.add_argument("-o", "--output_file", type=str, help="The name of the output file.")
183    parser.add_argument("-e", "--encoding", default="utf-8", type=str, help="The encoding to use.")
184
185    # Parse the arguments
186    args = parser.parse_args()
187
188    data.generate(args.input_file, args.encoding).to_csv(args.output_file, index=False, encoding=args.encoding)

Initialize the command line interface for the data extractor.

Parameters

data : Data The Data object to extract the data.

def extract_info(data: Data, file_path: str, encoding='utf-8'):
190def extract_info(data: Data, file_path: str, encoding="utf-8"):
191    """
192    Extract the data from the file.
193
194    Recommend to use the `extract_info_cli` function, becauseof the flexibility.
195
196    Parameters
197    ----------
198    data : Data
199        The Data object to extract the data.
200
201    file_path : str
202        The path of the file.
203        
204    encoding: str
205        The encoding of the file. Default is "utf-8".
206
207    Returns
208    -------
209    pd.DataFrame
210        The data extracted from the file.
211    """
212    return data.generate(file_path, encoding)

Extract the data from the file.

Recommend to use the extract_info_cli function, becauseof the flexibility.

Parameters

data : Data The Data object to extract the data.

file_path : str The path of the file.

encoding: str The encoding of the file. Default is "utf-8".

Returns

pd.DataFrame The data extracted from the file.