Added speed tests and plans for better parsing and speed.
This commit is contained in:
@@ -469,3 +469,99 @@ scan_video(path):
|
||||
```
|
||||
|
||||
## Video
|
||||
|
||||
|
||||
## Runtimes
|
||||
At commit #30 we are walking through the directory with the function shown in core above. A run through The Office US of 201 episodes gives us a total runtime of 17.716. Ideas of what is slowing down the runtime:
|
||||
|
||||
- Walking through the entire directory tree.
|
||||
- Checking that it is a folder that exists.
|
||||
- Guessing the episode name, number and info with the guessit library.
|
||||
- Langdetect of a subtitle file.
|
||||
|
||||
```
|
||||
Only scan: real 0m0.745s
|
||||
Only subs: real 0m4.273s
|
||||
Only videos: real 0m13.280s
|
||||
```
|
||||
|
||||
Clearly something happening in video that takes time.
|
||||
> Also more video objects that subs
|
||||
|
||||
## Moving away from guessit
|
||||
I wanted to check how accurate hits we could get with regex. The test is to compare the results from a simple reqex function with the output of guessit. Our code is the following:
|
||||
|
||||
```
|
||||
def removeLeadingZero(number):
|
||||
stringedNumber = str(number)
|
||||
if (len(stringedNumber) > 1 and stringedNumber[0] == '0'):
|
||||
return int(stringedNumber[1:])
|
||||
return int(number)
|
||||
|
||||
class episode(object):
|
||||
def __init__(self, path):
|
||||
self.path = path
|
||||
self.season = self.getSeasonNumber()
|
||||
self.episode = self.getEpisodeNumber()
|
||||
|
||||
def getSeasonNumber(self):
|
||||
m = re.search('[sS][0-9]{1,2}', self.path)
|
||||
if m:
|
||||
seasonNumber = re.sub('[sS]', '', m.group(0))
|
||||
return removeLeadingZero(seasonNumber)
|
||||
|
||||
def getEpisodeNumber(self):
|
||||
m = re.search('[eE][0-9]{1,2}', self.path)
|
||||
if m:
|
||||
episodeNumber = re.sub('[eE]', '', m.group(0))
|
||||
return removeLeadingZero(episodeNumber)
|
||||
```
|
||||
|
||||
With this we got:
|
||||
|
||||
```
|
||||
seasonedParser:$ time ./scandir.py '/mnt/mainframe/shows/'
|
||||
Total: 5926, missed was: 33
|
||||
|
||||
real 2m3.560s
|
||||
user 1m43.832s
|
||||
sys 0m0.840s
|
||||
```
|
||||
|
||||
Our main misses where episodes with multiple episodes within. Examples follow:
|
||||
|
||||
| Resolved | Filename | Manual guess | Reason for mismatch |
|
||||
| --- | --- | --- | --- |
|
||||
|[ ]| The.Office S03E24&25 - The Job [720p].mkv | 3 : 24 | Double episode |
|
||||
|[ ]| Seinfeld.S07E21E22.The.Bottle.Deposit.720p.WEBrip.AAC.EN-SUB.x264-[MULVAcoded].mkv | 7 : 21 | Double episode |
|
||||
|[ ]| Friends S10E17 E18.mkv | 10 : 17 | Double episode with spacing |
|
||||
|[x]| S00E121.The.Seinfeld.Story.mkv | 0 : 12 | Special episode |
|
||||
|[ ]| Brooklyn.Nine-Nine.S04E11-E12.The.Fugitive.Pt.1-2.1080p.WEB-DL.DD5.1.H264.mkv | 4 : 11 | Double episode |
|
||||
|[ ]| Greys.Anatomy.S06E01.E02.720p.HDTV.x264.srt | 6 : 1 | Double episode |
|
||||
|[ ]| Its.Always.Sunny.In.Philadelphia.S04E05E06.DSR.XviD-NoTV.avi | 4 : 5 | Multiple episode |
|
||||
|[ ]| Chicago.PD.S02E20.Law.and.Order.SVU.S16E20.720p.HDTV.X264-DIMENSION[rarbg].mkv | 2 : 20 | Guessed wrong part |
|
||||
|[ ]| 03x16 - The Excelsior Acquisition.avi | None | Separated by x |
|
||||
|[ ]| new.girl.421.hdtv-lol.mp4 | None | No s or ep id chars
|
||||
|
||||
|
||||
#### Excepts longer episode number
|
||||
Except longer episode number, see *S00E121*.
|
||||
|
||||
```
|
||||
def getEpisodeNumber(self):
|
||||
m = re.search('[eE][0-9]{1,3}', self.path)
|
||||
if m:
|
||||
episodeNumber = re.sub('[eE]', '', m.group(0))
|
||||
return removeLeadingZero(episodeNumber)
|
||||
```
|
||||
|
||||
Now we got 4 less misses
|
||||
|
||||
```
|
||||
seasonedParser:$ time ./scandir.py '/mnt/mainframe/shows/'
|
||||
Total: 5926, missed was: 29
|
||||
|
||||
real 2m0.766s
|
||||
user 1m41.482s
|
||||
sys 0m0.851s
|
||||
```
|
||||
Reference in New Issue
Block a user