Merge pull request #30 from aliparlakci/SelfDownloader

- Added self post download feature
- Made the searching process quicker by writing posts to file at the end of the search
- Added long file bug solution to remaining download classes
- Updated the README file to make it minimal
This commit is contained in:
aliparlakci
2018-07-10 02:46:37 +03:00
committed by GitHub
4 changed files with 113 additions and 48 deletions

View File

@@ -5,23 +5,28 @@ This program downloads imgur, gfycat and direct image and video links of saved p
## Table of Contents ## Table of Contents
- [What it can do?](#what-it-can-do)
- [Requirements](#requirements) - [Requirements](#requirements)
- [Setting up the script](#setting-up-the-script) - [Setting up the script](#setting-up-the-script)
- [Creating an imgur app](#creating-an-imgur-app) - [Creating an imgur app](#creating-an-imgur-app)
- [Program Modes](#program-modes) - [Program Modes](#program-modes)
- [saved mode](#saved-mode)
- [submitted mode](#submitted-mode)
- [upvoted mode](#upvoted-mode)
- [subreddit mode](#subreddit-mode)
- [multireddit mode](#multireddit-mode)
- [link mode](#link-mode)
- [log read mode](#log-read-mode)
- [Running the script](#running-the-script) - [Running the script](#running-the-script)
- [Using the command line arguments](#using-the-command-line-arguments) - [Using the command line arguments](#using-the-command-line-arguments)
- [Examples](#examples) - [Examples](#examples)
- [FAQ](#faq) - [FAQ](#faq)
- [Changelog](#changelog) - [Changelog](#changelog)
- [release-1.0.0](#release-100)
## What it can do?
### It...
- can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
- sorts post by hot, top, new and so on
- downloads imgur albums, gfycat links, [self posts](#i-can-t-open-the-self-posts-) and any link to a direct image
- skips the existing ones
- puts post titles to file's name
- puts every post to its subreddit's folder
- saves reusable a copy of posts' details that are found so that they can be re-downloaded again
- logs failed ones in a file to so that you can try to download them later
- can be run with double-clicking on Windows (but I don't recommend it)
## Requirements ## Requirements
- Python 3.x* - Python 3.x*
@@ -49,38 +54,27 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
## Program Modes ## Program Modes
All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments) All the program modes are activated with command-line arguments as shown [here](#using-the-command-line-arguments)
### saved mode - **saved mode**
In saved mode, the program gets posts from given user's saved posts. - Gets posts from given user's saved posts.
### submitted mode - **submitted mode**
In submitted mode, the program gets posts from given user's submitted posts. - Gets posts from given user's submitted posts.
### upvoted mode - **upvoted mode**
In submitted mode, the program gets posts from given user's upvoted posts. - Gets posts from given user's upvoted posts.
### subreddit mode - **subreddit mode**
In subreddit mode, the program gets posts from given subreddits* that is sorted by given type and limited by given number. - Gets posts from given subreddit or subreddits that is sorted by given type and limited by given number.
- You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
Multiple subreddits can be given - **multireddit mode**
- Gets posts from given user's given multireddit that is sorted by given type and limited by given number.
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).* - **link mode**
### multireddit mode - Gets posts from given reddit link.
In multireddit mode, the program gets posts from given user's given multireddit that is sorted by given type and limited by given number. - You may customize the behaviour with `--sort`, `--time`, `--limit`.
### link mode - You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).
In link mode, the program gets posts from given reddit link. - **log read mode**
- Takes a log file which created by itself (json files), reads posts and tries downloading them again.
You may customize the behaviour with `--sort`, `--time`, `--limit`. - Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
*You may also use search in this mode. See [`py -3 script.py --help`](#using-the-command-line-arguments).*
## log read mode
Two log files are created each time *script.py* runs.
- **POSTS** Saves all the posts without filtering.
- **FAILED** Keeps track of posts that are tried to be downloaded but failed.
In log mode, the program takes a log file which created by itself, reads posts and tries downloading them again.
Running log read mode for FAILED.json file once after the download is complete is **HIGHLY** recommended as unexpected problems may occur.
## Running the script ## Running the script
**WARNING** *DO NOT* let more than *1* instance of script run as it interferes with IMGUR Request Rate. **DO NOT** let more than one instance of the script run as it interferes with IMGUR Request Rate.
### Using the command line arguments ### Using the command line arguments
If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure). If no arguments are passed program will prompt you for arguments below which means you may start up the script with double-clicking on it (at least on Windows for sure).
@@ -89,7 +83,7 @@ Open up the [terminal](https://www.reddit.com/r/NSFW411/comments/8vtnl8/meta_i_m
Run the script.py file from terminal with command-line arguments. Here is the help page: Run the script.py file from terminal with command-line arguments. Here is the help page:
**ATTENTION** Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird. Use `.\` for current directory and `..\` for upper directory when using short directories, otherwise it might act weird.
```console ```console
$ py -3 script.py --help $ py -3 script.py --help
@@ -166,6 +160,10 @@ py -3 script.py C:\\NEW_FOLDER\\ANOTHER_FOLDER --log UNNAMED_FOLDER\\FAILED.json
### I can't startup the script no matter what. ### I can't startup the script no matter what.
- Try `python3` or `python` or `py -3` as python have real issues about naming their program - Try `python3` or `python` or `py -3` as python have real issues about naming their program
### I can't open the self posts.
- Self posts are held at subreddit as Markdown. So, the script downloads them as Markdown in order not to lose their stylings. However, there is a great Chrome extension [here](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with Chrome.
## Changelog ## Changelog
### v1.0.0 ### 10/07/2018
- Initial release - Added support for *self* post
- Now getting posts is quicker

View File

@@ -11,7 +11,7 @@ import sys
import time import time
from pathlib import Path, PurePath from pathlib import Path, PurePath
from src.downloader import Direct, Gfycat, Imgur from src.downloader import Direct, Gfycat, Imgur, Self
from src.parser import LinkDesigner from src.parser import LinkDesigner
from src.searcher import getPosts from src.searcher import getPosts
from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector, from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
@@ -451,7 +451,22 @@ def download(submissions):
print(exception) print(exception)
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]}) FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
downloadedCount -= 1 downloadedCount -= 1
elif submissions[i]['postType'] == 'self':
print("SELF")
try:
Self(directory,submissions[i])
except FileAlreadyExistsError:
print("It already exists")
downloadedCount -= 1
duplicates += 1
except Exception as exception:
print(exception)
FAILED_FILE.add({int(i+1):[str(exception),submissions[i]]})
downloadedCount -= 1
else: else:
print("No match found, skipping...") print("No match found, skipping...")
downloadedCount -= 1 downloadedCount -= 1

View File

@@ -1,3 +1,4 @@
import io
import os import os
import sys import sys
import urllib.request import urllib.request
@@ -16,7 +17,7 @@ except ModuleNotFoundError:
install("imgurpython") install("imgurpython")
from imgurpython import * from imgurpython import *
VanillaPrint = print
print = printToFile print = printToFile
def dlProgress(count, blockSize, totalSize): def dlProgress(count, blockSize, totalSize):
@@ -294,3 +295,45 @@ class Direct:
tempDir = directory / (POST['postId']+".tmp") tempDir = directory / (POST['postId']+".tmp")
getFile(fileDir,tempDir,POST['postURL']) getFile(fileDir,tempDir,POST['postURL'])
class Self:
def __init__(self,directory,post):
if not os.path.exists(directory): os.makedirs(directory)
title = nameCorrector(post['postTitle'])
print(title+"_"+post['postId']+".md")
fileDir = title+"_"+post['postId']+".md"
fileDir = directory / fileDir
if Path.is_file(fileDir):
raise FileAlreadyExistsError
try:
self.writeToFile(fileDir,post)
except FileNotFoundError:
fileDir = post['postId']+".md"
fileDir = directory / fileDir
self.writeToFile(fileDir,post)
@staticmethod
def writeToFile(directory,post):
content = ("## ["
+ post["postTitle"]
+ "]("
+ post["postURL"]
+ ")\n"
+ post["postContent"]
+ "\n\n---\n\n"
+ "submitted by [u/"
+ post["postSubmitter"]
+ "](https://www.reddit.com/user/"
+ post["postSubmitter"]
+ ")")
with io.open(directory,"w",encoding="utf-8") as FILE:
VanillaPrint(content,file=FILE)
print("Downloaded")

View File

@@ -308,6 +308,10 @@ def redditSearcher(posts,SINGLE_POST=False):
imgurCount = 0 imgurCount = 0
global directCount global directCount
directCount = 0 directCount = 0
global selfCount
selfCount = 0
allPosts = {}
postsFile = createLogFile("POSTS") postsFile = createLogFile("POSTS")
@@ -356,13 +360,15 @@ def redditSearcher(posts,SINGLE_POST=False):
printSubmission(submission,subCount,orderCount) printSubmission(submission,subCount,orderCount)
subList.append(details) subList.append(details)
postsFile.add({subCount:[details]}) allPosts = {**allPosts,**details}
postsFile.add(allPosts)
if not len(subList) == 0: if not len(subList) == 0:
print( print(
"\nTotal of {} submissions found!\n"\ "\nTotal of {} submissions found!\n"\
"{} GFYCATs, {} IMGURs and {} DIRECTs\n" "{} GFYCATs, {} IMGURs, {} DIRECTs and {} SELF POSTS\n"
.format(len(subList),gfycatCount,imgurCount,directCount) .format(len(subList),gfycatCount,imgurCount,directCount,selfCount)
) )
return subList return subList
else: else:
@@ -372,6 +378,7 @@ def checkIfMatching(submission):
global gfycatCount global gfycatCount
global imgurCount global imgurCount
global directCount global directCount
global selfCount
try: try:
details = {'postId':submission.id, details = {'postId':submission.id,
@@ -397,13 +404,15 @@ def checkIfMatching(submission):
imgurCount += 1 imgurCount += 1
return details return details
elif isDirectLink(submission.url) is True: elif isDirectLink(submission.url):
details['postType'] = 'direct' details['postType'] = 'direct'
directCount += 1 directCount += 1
return details return details
elif submission.is_self: elif submission.is_self:
details['postType'] = 'self' details['postType'] = 'self'
details['postContent'] = submission.selftext
selfCount += 1
return details return details
def printSubmission(SUB,validNumber,totalNumber): def printSubmission(SUB,validNumber,totalNumber):