Kkit.mder

This is a multithreading m3u8 download module. Support download m3u8 file and convert it to mp4. Support resume download.

Example:

#test.py
downloader = mder.m3u8_downloader(m3u8_file_path='./test.m3u8',temp_file_path='./',mp4_path='./test.mp4',num_of_threads=10)
# parameters
# 1.m3u8_file_path
# default : no default   (type : str)
# 2.temp_file_path
# default : '.'          (type : str)
# 3.mp4_path
# default : './test.mp4' (type : str)
# 4.num_of_threads
# default : 10           (type : int)

downloader.start()
# parameters
# 1.mod
# default : 0            (type : int)
# mod 0 means delete TS folder and m3u8 file if download completely
# mod 1 means delete m3u8 file only if download completely
# mod 2 means delete TS folder only if download completely
# mod 3 means reserve TS folder and m3u8 file if download completely
# 2.time_out
# default : 60           (type : int)(units : seconds)
# The time_out is the timeout in request.get(timeout=)

before download

the structure of ./ is:

.
├── test.m3u8
└── test.py

when it is downloading

the structure of ./ is:

.
├── TS
│   ├── qzCFnDUZE9_720_5308_0000.ts
│   ├── qzCFnDUZE9_720_5308_0001.ts
│   ├── qzCFnDUZE9_720_5308_0002.ts
│   ├── qzCFnDUZE9_720_5308_0003.ts
│   ├── qzCFnDUZE9_720_5308_0004.ts
│   ├── qzCFnDUZE9_720_5308_0005.ts
│   ├── qzCFnDUZE9_720_5308_0006.ts
│   ├── qzCFnDUZE9_720_5308_0007.ts
│   ├── qzCFnDUZE9_720_5308_0008.ts
│   ├── qzCFnDUZE9_720_5308_0009.ts
│   └── qzCFnDUZE9_720_5308_0010.ts  
├── test.m3u8
└── test.py

process bar: <<\*>> 29% 500/1752 [01:33<04:02] <<\*>>

TS is temp folder, all .ts file are in it. The path of it is %temp_file_path%/TS, in the test case, it is in ./TS. If the mission is not complete, the m3u8 file and TS folder will be reserved, you can instance a new downloader with corresponding TS folder and m3u8 file, and use the start() function to begin, in this way, the mission will go on.

after download and download successfully

the structure of ./ is:

.
├── test.mp4
└── test.py

If some .ts download failed, the module will redownload for 3 times, and the information will print to the command line

at last, the command line is like this:

<<*>>  99% 1737/1752 [05:35<00:22] <<*>>
thread0 Time out ERROR qzCFnDUZE9_720_5308_1710.ts
thread2 Time out ERROR qzCFnDUZE9_720_5308_1722.ts
thread0 redownload successfully qzCFnDUZE9_720_5308_1710.ts
<<*>>  99% 1738/1752 [06:20<03:19] <<*>>
thread2 redownload successfully qzCFnDUZE9_720_5308_1722.ts
<<*>> 100% 1752/1752 [06:26<00:00] <<*>>
downloading finished 100.00%

restart If you want to restart a incomplete mission, you only should use the corresponding TS folder and .m3u8 file

View Source

  1"""
  2This is a multithreading m3u8 download module. Support download m3u8 file and convert it to mp4. Support resume download.
  3
  4Example:
  5
  6```python
  7#test.py
  8downloader = mder.m3u8_downloader(m3u8_file_path='./test.m3u8',temp_file_path='./',mp4_path='./test.mp4',num_of_threads=10)
  9# parameters
 10# 1.m3u8_file_path
 11# default : no default   (type : str)
 12# 2.temp_file_path
 13# default : '.'          (type : str)
 14# 3.mp4_path
 15# default : './test.mp4' (type : str)
 16# 4.num_of_threads
 17# default : 10           (type : int)
 18
 19downloader.start()
 20# parameters
 21# 1.mod
 22# default : 0            (type : int)
 23# mod 0 means delete TS folder and m3u8 file if download completely
 24# mod 1 means delete m3u8 file only if download completely
 25# mod 2 means delete TS folder only if download completely
 26# mod 3 means reserve TS folder and m3u8 file if download completely
 27# 2.time_out
 28# default : 60           (type : int)(units : seconds)
 29# The time_out is the timeout in request.get(timeout=)
 30```
 31
 32**before download**
 33
 34the structure of ./ is:
 35```
 36.
 37├── test.m3u8
 38└── test.py
 39```
 40
 41**when it is downloading**
 42
 43the structure of ./ is:
 44```
 45.
 46├── TS
 47│   ├── qzCFnDUZE9_720_5308_0000.ts
 48│   ├── qzCFnDUZE9_720_5308_0001.ts
 49│   ├── qzCFnDUZE9_720_5308_0002.ts
 50│   ├── qzCFnDUZE9_720_5308_0003.ts
 51│   ├── qzCFnDUZE9_720_5308_0004.ts
 52│   ├── qzCFnDUZE9_720_5308_0005.ts
 53│   ├── qzCFnDUZE9_720_5308_0006.ts
 54│   ├── qzCFnDUZE9_720_5308_0007.ts
 55│   ├── qzCFnDUZE9_720_5308_0008.ts
 56│   ├── qzCFnDUZE9_720_5308_0009.ts
 57│   └── qzCFnDUZE9_720_5308_0010.ts  
 58├── test.m3u8
 59└── test.py
 60```
 61process bar:  <<\*>>  29% 500/1752 [01:33<04:02] <<\*>> 
 62
 63TS is temp folder, all .ts file are in it. The path of it is %temp_file_path%/TS, in the test case, it is in ./TS. If the mission is not complete, the m3u8 file and TS folder will be reserved, you can instance a new downloader with corresponding TS folder and m3u8 file, and use the start() function to begin, in this way, the mission will go on.
 64
 65**after download and download successfully**
 66
 67the structure of ./ is:
 68```
 69.
 70├── test.mp4
 71└── test.py
 72```
 73
 74If some .ts download failed, the module will redownload for 3 times, and the information will print to the command line
 75
 76at last, the command line is like this:
 77```
 78<<*>>  99% 1737/1752 [05:35<00:22] <<*>>
 79thread0 Time out ERROR qzCFnDUZE9_720_5308_1710.ts
 80thread2 Time out ERROR qzCFnDUZE9_720_5308_1722.ts
 81thread0 redownload successfully qzCFnDUZE9_720_5308_1710.ts
 82<<*>>  99% 1738/1752 [06:20<03:19] <<*>>
 83thread2 redownload successfully qzCFnDUZE9_720_5308_1722.ts
 84<<*>> 100% 1752/1752 [06:26<00:00] <<*>>
 85downloading finished 100.00%
 86```
 87**restart**
 88If you want to restart a incomplete mission, you only should use the corresponding TS folder and .m3u8 file
 89
 90"""
 91
 92# a multithreading m3u8 download module and the number of threads can decide by yourself
 93# author: walkureHHH
 94# last modify: 2020/06/17
 95import requests
 96from urllib.parse import urljoin
 97from threading import Thread
 98from threading import Lock
 99import os
100import shutil
101from tqdm import tqdm
102
103
104class thread_num_ERROR(Exception):
105    """
106    Thread number error.
107    Be raised when the number of threads is eqial to smaller than 0.
108    """
109    pass
110
111class mod_ERROR(Exception):
112    """
113    Mod error.
114    Be raised when the mod is not in [0,1,2,3].
115    """
116    pass
117
118class m3u8_downloader:
119    """
120    M3u8 downloader.
121    """
122    temp_file_path = ''
123    """@private"""
124    mp4_path = ''
125    """@private"""
126    num_of_threads = ''
127    """@private"""
128    m3u8_file_path = ''
129    """@private"""
130    urls = []
131    """@private"""
132    names = []
133    """@private"""
134    has_download_name = []
135    """@private"""
136    cant_dow = []
137    """@private"""
138    total = 0
139    """@private"""
140    lock = Lock()
141    """@private"""
142    def __init__(self,m3u8_file_path, url_prefix=None,temp_file_path='.',mp4_path='./test.mp4',num_of_threads=10):
143        """
144        Initialize the m3u8 downloader.
145
146        Parameters
147        ----------
148        m3u8_file_path : str
149            The path of the m3u8 file.
150
151        url_prefix : str
152            The prefix of the url. Default is None.
153            Some m3u8 file has not the full url, so you can add the prefix to the url.
154            For example, the url is '/video/1.ts', and the prefix is 'http://www.example.com'.
155
156        temp_file_path : str
157            The path of the temporary folder (store *.ts files). Default is '.'.
158
159        mp4_path : str
160            The path of the result mp4 file. Default is './test.mp4'.
161
162        num_of_threads : int
163            The number of threads. Default is 10.
164
165        """
166        if num_of_threads <= 0:
167            raise thread_num_ERROR('the number of threads can\'t smaller than 0')
168        self.mp4_path = mp4_path
169        self.temp_file_path = temp_file_path 
170        self.num_of_threads = num_of_threads
171        self.m3u8_file_path = m3u8_file_path
172        if os.path.exists(self.temp_file_path+'/TS'):
173            print("""warning: the temporary folder has exited\n 
174please comfirm the temporary folder included the fragment video you need""")
175            self.has_download_name = os.listdir(self.temp_file_path+'/TS')
176        else:
177            os.mkdir(self.temp_file_path+'/TS')
178            self.has_download_name = []
179        with open(self.m3u8_file_path,'r') as m3u8:
180            temp_url = [m3u8_lines.replace('\n','') for m3u8_lines in m3u8.readlines() if m3u8_lines.startswith('#')==False]
181        if url_prefix != None:
182            temp_url = [urljoin(url_prefix, i) for i in temp_url]
183        self.total = len(temp_url)
184        self.names = [i.split('/')[-1].split('?')[0] for i in temp_url]
185        self.urls = [[] for j in range(0, self.num_of_threads)]
186        for index, el in enumerate(temp_url):
187            self.urls[index%self.num_of_threads].append(el)
188        return
189    
190    def start(self,mod = 0, time_out = 60):
191        """
192        Start download.
193
194        Parameters
195        ----------
196        mod : int
197            The mod of the download. Default is 0.
198            0: delete the m3u8 file and the temporary folder.
199            1: delete the m3u8 file.
200            2: delete the temporary folder.
201            3: do nothing.
202            
203        time_out : int
204            The time out of the download. Default is 60s.
205        """
206        if mod not in [0,1,2,3]:
207            raise mod_ERROR('Only have mod 0 , 1 , 2 or 3')
208        with tqdm(total=self.total,bar_format='<<*>> {percentage:3.0f}% {n_fmt}/{total_fmt} [{elapsed}<{remaining}] <<*>> ') as jdt:
209            Threads = []
210            for i in range(self.num_of_threads):
211                thread = Thread(target=self.__download, args=(self.urls[i],'thread'+str(i),jdt,time_out))
212                Threads.append(thread)
213            for threads in Threads:
214                threads.start()
215            for threads in Threads:
216                threads.join()
217        percent = '%.02f%%'%((len(self.has_download_name)/len(self.names))*100)
218        if len(self.has_download_name)==len(self.names):
219            print('downloading finished',percent)
220            for names in self.names:
221                ts = open(self.temp_file_path+'/TS/'+names,'rb')
222                with open(self.mp4_path,'ab') as mp4:
223                    mp4.write(ts.read())
224                ts.close()
225            if mod == 0 or mod == 1:
226                os.remove(self.m3u8_file_path)
227            if mod == 0 or mod == 2:
228                shutil.rmtree(self.temp_file_path+'/TS')
229        else:
230            print('----------------------------------------------------------------')
231            for cantdow_urls in self.cant_dow:
232                print('downloading fail:',cantdow_urls)
233            print('incomplete downloading',percent)
234
235    def __download(self, download_list, thread_name, jdt, time_out):
236        for urls in download_list:
237            if urls.split('/')[-1].split('?')[0] not in self.has_download_name:
238                for i in range(0,5):
239                    try:
240                        conn = requests.get(urls,timeout=time_out)
241                        if conn.status_code == 200:
242                            with open(self.temp_file_path+'/TS/'+urls.split('/')[-1].split('?')[0],'wb') as ts:
243                                ts.write(conn.content)
244                            with self.lock:
245                                if i != 0:
246                                    print('\n'+thread_name,'redownload successfully',urls.split('/')[-1].split('?')[0])
247                                self.has_download_name.append(urls.split('/')[-1].split('?')[0])
248                                jdt.update(1)
249                            break
250                        else:
251                            with self.lock:
252                                if i == 0:
253                                    print('\n'+thread_name,conn.status_code,urls.split('/')[-1].split('?')[0],'begin retry 1')
254                                else:
255                                    print('\n'+thread_name,conn.status_code,urls.split('/')[-1].split('?')[0],'Retry '+ str(i) +'/3')
256                                if i == 4:
257                                    self.cant_dow.append(urls)
258                    except:
259                        with self.lock:
260                            if i == 0:
261                                print('\n'+thread_name,'Time out ERROR',urls.split('/')[-1].split('?')[0],'begin retry 1')
262                            else:
263                                print('\n'+thread_name,'Time out ERROR',urls.split('/')[-1].split('?')[0],'Retry '+ str(i) +'/3')
264                            if i == 4:
265                                self.cant_dow.append(urls)
266            else:
267                with self.lock:
268                    jdt.update(1)
269if __name__ == "__main__":
270    a = m3u8_downloader('/mnt/c/Users/kylis/Downloads/r.m3u8',temp_file_path='.',mp4_path='./1.mp4', num_of_threads=17)
271    a.start()

class thread_num_ERROR(builtins.Exception): View Source

105class thread_num_ERROR(Exception):
106    """
107    Thread number error.
108    Be raised when the number of threads is eqial to smaller than 0.
109    """
110    pass

Thread number error. Be raised when the number of threads is eqial to smaller than 0.

class mod_ERROR(builtins.Exception): View Source

112class mod_ERROR(Exception):
113    """
114    Mod error.
115    Be raised when the mod is not in [0,1,2,3].
116    """
117    pass

Mod error. Be raised when the mod is not in [0,1,2,3].

class m3u8_downloader: View Source

119class m3u8_downloader:
120    """
121    M3u8 downloader.
122    """
123    temp_file_path = ''
124    """@private"""
125    mp4_path = ''
126    """@private"""
127    num_of_threads = ''
128    """@private"""
129    m3u8_file_path = ''
130    """@private"""
131    urls = []
132    """@private"""
133    names = []
134    """@private"""
135    has_download_name = []
136    """@private"""
137    cant_dow = []
138    """@private"""
139    total = 0
140    """@private"""
141    lock = Lock()
142    """@private"""
143    def __init__(self,m3u8_file_path, url_prefix=None,temp_file_path='.',mp4_path='./test.mp4',num_of_threads=10):
144        """
145        Initialize the m3u8 downloader.
146
147        Parameters
148        ----------
149        m3u8_file_path : str
150            The path of the m3u8 file.
151
152        url_prefix : str
153            The prefix of the url. Default is None.
154            Some m3u8 file has not the full url, so you can add the prefix to the url.
155            For example, the url is '/video/1.ts', and the prefix is 'http://www.example.com'.
156
157        temp_file_path : str
158            The path of the temporary folder (store *.ts files). Default is '.'.
159
160        mp4_path : str
161            The path of the result mp4 file. Default is './test.mp4'.
162
163        num_of_threads : int
164            The number of threads. Default is 10.
165
166        """
167        if num_of_threads <= 0:
168            raise thread_num_ERROR('the number of threads can\'t smaller than 0')
169        self.mp4_path = mp4_path
170        self.temp_file_path = temp_file_path 
171        self.num_of_threads = num_of_threads
172        self.m3u8_file_path = m3u8_file_path
173        if os.path.exists(self.temp_file_path+'/TS'):
174            print("""warning: the temporary folder has exited\n 
175please comfirm the temporary folder included the fragment video you need""")
176            self.has_download_name = os.listdir(self.temp_file_path+'/TS')
177        else:
178            os.mkdir(self.temp_file_path+'/TS')
179            self.has_download_name = []
180        with open(self.m3u8_file_path,'r') as m3u8:
181            temp_url = [m3u8_lines.replace('\n','') for m3u8_lines in m3u8.readlines() if m3u8_lines.startswith('#')==False]
182        if url_prefix != None:
183            temp_url = [urljoin(url_prefix, i) for i in temp_url]
184        self.total = len(temp_url)
185        self.names = [i.split('/')[-1].split('?')[0] for i in temp_url]
186        self.urls = [[] for j in range(0, self.num_of_threads)]
187        for index, el in enumerate(temp_url):
188            self.urls[index%self.num_of_threads].append(el)
189        return
190    
191    def start(self,mod = 0, time_out = 60):
192        """
193        Start download.
194
195        Parameters
196        ----------
197        mod : int
198            The mod of the download. Default is 0.
199            0: delete the m3u8 file and the temporary folder.
200            1: delete the m3u8 file.
201            2: delete the temporary folder.
202            3: do nothing.
203            
204        time_out : int
205            The time out of the download. Default is 60s.
206        """
207        if mod not in [0,1,2,3]:
208            raise mod_ERROR('Only have mod 0 , 1 , 2 or 3')
209        with tqdm(total=self.total,bar_format='<<*>> {percentage:3.0f}% {n_fmt}/{total_fmt} [{elapsed}<{remaining}] <<*>> ') as jdt:
210            Threads = []
211            for i in range(self.num_of_threads):
212                thread = Thread(target=self.__download, args=(self.urls[i],'thread'+str(i),jdt,time_out))
213                Threads.append(thread)
214            for threads in Threads:
215                threads.start()
216            for threads in Threads:
217                threads.join()
218        percent = '%.02f%%'%((len(self.has_download_name)/len(self.names))*100)
219        if len(self.has_download_name)==len(self.names):
220            print('downloading finished',percent)
221            for names in self.names:
222                ts = open(self.temp_file_path+'/TS/'+names,'rb')
223                with open(self.mp4_path,'ab') as mp4:
224                    mp4.write(ts.read())
225                ts.close()
226            if mod == 0 or mod == 1:
227                os.remove(self.m3u8_file_path)
228            if mod == 0 or mod == 2:
229                shutil.rmtree(self.temp_file_path+'/TS')
230        else:
231            print('----------------------------------------------------------------')
232            for cantdow_urls in self.cant_dow:
233                print('downloading fail:',cantdow_urls)
234            print('incomplete downloading',percent)
235
236    def __download(self, download_list, thread_name, jdt, time_out):
237        for urls in download_list:
238            if urls.split('/')[-1].split('?')[0] not in self.has_download_name:
239                for i in range(0,5):
240                    try:
241                        conn = requests.get(urls,timeout=time_out)
242                        if conn.status_code == 200:
243                            with open(self.temp_file_path+'/TS/'+urls.split('/')[-1].split('?')[0],'wb') as ts:
244                                ts.write(conn.content)
245                            with self.lock:
246                                if i != 0:
247                                    print('\n'+thread_name,'redownload successfully',urls.split('/')[-1].split('?')[0])
248                                self.has_download_name.append(urls.split('/')[-1].split('?')[0])
249                                jdt.update(1)
250                            break
251                        else:
252                            with self.lock:
253                                if i == 0:
254                                    print('\n'+thread_name,conn.status_code,urls.split('/')[-1].split('?')[0],'begin retry 1')
255                                else:
256                                    print('\n'+thread_name,conn.status_code,urls.split('/')[-1].split('?')[0],'Retry '+ str(i) +'/3')
257                                if i == 4:
258                                    self.cant_dow.append(urls)
259                    except:
260                        with self.lock:
261                            if i == 0:
262                                print('\n'+thread_name,'Time out ERROR',urls.split('/')[-1].split('?')[0],'begin retry 1')
263                            else:
264                                print('\n'+thread_name,'Time out ERROR',urls.split('/')[-1].split('?')[0],'Retry '+ str(i) +'/3')
265                            if i == 4:
266                                self.cant_dow.append(urls)
267            else:
268                with self.lock:
269                    jdt.update(1)

M3u8 downloader.

m3u8_downloader( m3u8_file_path, url_prefix=None, temp_file_path='.', mp4_path='./test.mp4', num_of_threads=10) View Source

143    def __init__(self,m3u8_file_path, url_prefix=None,temp_file_path='.',mp4_path='./test.mp4',num_of_threads=10):
144        """
145        Initialize the m3u8 downloader.
146
147        Parameters
148        ----------
149        m3u8_file_path : str
150            The path of the m3u8 file.
151
152        url_prefix : str
153            The prefix of the url. Default is None.
154            Some m3u8 file has not the full url, so you can add the prefix to the url.
155            For example, the url is '/video/1.ts', and the prefix is 'http://www.example.com'.
156
157        temp_file_path : str
158            The path of the temporary folder (store *.ts files). Default is '.'.
159
160        mp4_path : str
161            The path of the result mp4 file. Default is './test.mp4'.
162
163        num_of_threads : int
164            The number of threads. Default is 10.
165
166        """
167        if num_of_threads <= 0:
168            raise thread_num_ERROR('the number of threads can\'t smaller than 0')
169        self.mp4_path = mp4_path
170        self.temp_file_path = temp_file_path 
171        self.num_of_threads = num_of_threads
172        self.m3u8_file_path = m3u8_file_path
173        if os.path.exists(self.temp_file_path+'/TS'):
174            print("""warning: the temporary folder has exited\n 
175please comfirm the temporary folder included the fragment video you need""")
176            self.has_download_name = os.listdir(self.temp_file_path+'/TS')
177        else:
178            os.mkdir(self.temp_file_path+'/TS')
179            self.has_download_name = []
180        with open(self.m3u8_file_path,'r') as m3u8:
181            temp_url = [m3u8_lines.replace('\n','') for m3u8_lines in m3u8.readlines() if m3u8_lines.startswith('#')==False]
182        if url_prefix != None:
183            temp_url = [urljoin(url_prefix, i) for i in temp_url]
184        self.total = len(temp_url)
185        self.names = [i.split('/')[-1].split('?')[0] for i in temp_url]
186        self.urls = [[] for j in range(0, self.num_of_threads)]
187        for index, el in enumerate(temp_url):
188            self.urls[index%self.num_of_threads].append(el)
189        return

Initialize the m3u8 downloader.

Parameters

m3u8_file_path : str The path of the m3u8 file.

url_prefix : str The prefix of the url. Default is None. Some m3u8 file has not the full url, so you can add the prefix to the url. For example, the url is '/video/1.ts', and the prefix is 'http://www.example.com'.

temp_file_path : str The path of the temporary folder (store *.ts files). Default is '.'.

mp4_path : str The path of the result mp4 file. Default is './test.mp4'.

num_of_threads : int The number of threads. Default is 10.

def start(self, mod=0, time_out=60): View Source

191    def start(self,mod = 0, time_out = 60):
192        """
193        Start download.
194
195        Parameters
196        ----------
197        mod : int
198            The mod of the download. Default is 0.
199            0: delete the m3u8 file and the temporary folder.
200            1: delete the m3u8 file.
201            2: delete the temporary folder.
202            3: do nothing.
203            
204        time_out : int
205            The time out of the download. Default is 60s.
206        """
207        if mod not in [0,1,2,3]:
208            raise mod_ERROR('Only have mod 0 , 1 , 2 or 3')
209        with tqdm(total=self.total,bar_format='<<*>> {percentage:3.0f}% {n_fmt}/{total_fmt} [{elapsed}<{remaining}] <<*>> ') as jdt:
210            Threads = []
211            for i in range(self.num_of_threads):
212                thread = Thread(target=self.__download, args=(self.urls[i],'thread'+str(i),jdt,time_out))
213                Threads.append(thread)
214            for threads in Threads:
215                threads.start()
216            for threads in Threads:
217                threads.join()
218        percent = '%.02f%%'%((len(self.has_download_name)/len(self.names))*100)
219        if len(self.has_download_name)==len(self.names):
220            print('downloading finished',percent)
221            for names in self.names:
222                ts = open(self.temp_file_path+'/TS/'+names,'rb')
223                with open(self.mp4_path,'ab') as mp4:
224                    mp4.write(ts.read())
225                ts.close()
226            if mod == 0 or mod == 1:
227                os.remove(self.m3u8_file_path)
228            if mod == 0 or mod == 2:
229                shutil.rmtree(self.temp_file_path+'/TS')
230        else:
231            print('----------------------------------------------------------------')
232            for cantdow_urls in self.cant_dow:
233                print('downloading fail:',cantdow_urls)
234            print('incomplete downloading',percent)

Start download.

Parameters

mod : int The mod of the download. Default is 0. 0: delete the m3u8 file and the temporary folder. 1: delete the m3u8 file. 2: delete the temporary folder. 3: do nothing.

time_out : int The time out of the download. Default is 60s.