33 Commits

Author SHA1 Message Date
Ali Parlakçı
43cf0a4d42 Typo fix 2018-08-06 07:42:52 +03:00
Ali Parlakci
3693cf46f8 Updated version 2018-08-06 07:40:23 +03:00
Ali Parlakci
04152e8554 Updated changelog 2018-08-06 07:37:35 +03:00
Ali Parlakci
210238d086 Sending header when requesting a file 2018-08-06 07:35:43 +03:00
Ali Parlakci
90e071354f Update changelog 2018-08-04 10:25:29 +03:00
Ali Parlakci
426089d0f3 Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-08-04 10:23:22 +03:00
Ali Parlakci
7ae6c6385d Do not print post type to console 2018-08-04 10:22:41 +03:00
Ali Parlakçı
97d15f9974 Update README.md 2018-08-01 13:08:09 +03:00
Ali Parlakci
172cd72dc1 Update changelog 2018-07-30 13:37:29 +03:00
Ali Parlakci
af29492951 Hide KeyboardInterrupt 2018-07-30 13:36:48 +03:00
Ali Parlakci
5633b301f3 Bug fix 2018-07-30 13:36:27 +03:00
Ali Parlakci
5ed855af28 Update changelog 2018-07-30 13:23:34 +03:00
Ali Parlakci
7ccf2fb7f9 Open web browser as prompting for imgur credentials 2018-07-30 13:22:30 +03:00
Ali Parlakçı
2297e9ed86 Update README.md 2018-07-26 20:04:07 +03:00
Ali Parlakçı
401e014059 Update README.md 2018-07-26 20:03:25 +03:00
Ali Parlakçı
eb31d38c44 Update README.md 2018-07-26 19:39:51 +03:00
Ali Parlakçı
747fefea14 Update README.md 2018-07-26 19:36:50 +03:00
Ali Parlakçı
80cc4fade3 Update README.md 2018-07-26 19:33:47 +03:00
Ali Parlakçı
c26843c7fc Set theme jekyll-theme-cayman 2018-07-26 19:21:55 +03:00
Ali Parlakçı
a14edc9f5a Update README.md 2018-07-26 15:08:23 +03:00
Ali Parlakci
d685860c22 Update version 2018-07-26 12:25:54 +03:00
Ali Parlakci
dcf9f35273 Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-07-26 12:25:50 +03:00
Ali Parlakci
7fdf03aa24 Added new line after 'GETTING POSTS' 2018-07-26 12:25:23 +03:00
Ali Parlakçı
25d61a4c78 Update README.md 2018-07-26 12:23:08 +03:00
Ali Parlakci
558eb107f4 Update changelog 2018-07-26 12:01:00 +03:00
Ali Parlakci
6e74630050 Typo fix 2018-07-26 11:59:29 +03:00
Ali Parlakci
2fd9248715 Added quit after finish option 2018-07-26 11:15:13 +03:00
Ali Parlakci
457b8cd21c Added remaining credits to log file 2018-07-26 11:08:37 +03:00
Ali Parlakci
e953456ead Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-07-26 10:10:48 +03:00
Ali Parlakci
ed0564fba0 Improve verbose mode 2018-07-26 10:08:57 +03:00
Ali Parlakçı
5378555f74 Update COMPILE_FROM_SOURCE.md 2018-07-26 09:24:50 +03:00
Ali Parlakçı
95ef308915 Update COMPILE_FROM_SOURCE.md 2018-07-26 09:22:14 +03:00
Ali Parlakçı
436f867f2e Update COMPILE_FROM_SOURCE.md 2018-07-26 09:22:03 +03:00
8 changed files with 141 additions and 97 deletions

View File

@@ -1,7 +1,7 @@
# Bulk Downloader for Reddit
This program downloads imgur, gfycat and direct image and video links of saved posts from a reddit account. It is written in Python 3.
**PLEASE** post any issue you have with the script to [Issues](https://github.com/aliparlakci/bulk-downloader-for-reddit/issues) tab. Since I don't have any testers or contributers I need your feedback.
Downloads media from reddit posts.
## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## What it can do
- Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
@@ -13,33 +13,17 @@ This program downloads imgur, gfycat and direct image and video links of saved p
- Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
- Logs failed ones in a file to so that you can try to download them later
## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## How it works
- For **Windows** and **Linux** users, there are executable files to run easily without installing a third party program. But if you are a paranoid like me, you can **[compile it from source code](docs/COMPILE_FROM_SOURCE.md)**.
- In Windows, double click on bulk-downloader-for-reddit file
- In Linux, extract files to a folder and open terminal inside it. Type **`./bulk-downloader-for-reddit`**
- **MacOS** users have to **[compile it from source code](docs/COMPILE_FROM_SOURCE.md)**.
Script also accepts **command-line arguments**, get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
## Additional options
Script also accepts additional options via command-line arguments. Get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
## Setting up the script
Because this is not a commercial app, you need to create an imgur developer app in order API to work.
### Creating an imgur app
* Go to https://api.imgur.com/oauth2/addclient
* Enter a name into the **Application Name** field.
* Pick **Anonymous usage without user authorization** as an **Authorization type**\*
* Enter your email into the Email field.
* Correct CHAPTCHA
* Click **submit** button
You need to create an imgur developer app in order API to work. Go to https://api.imgur.com/oauth2/addclient and fill the form (It does not really matter how you fill it). It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
It should redirect to a page which shows your **imgur_client_id** and **imgur_client_secret**
\* Select **OAuth 2 authorization without a callback URL** first then select **Anonymous usage without user authorization** if it says *Authorization callback URL: required*
## FAQ
### What do the dots resemble when getting posts?
- Each dot means that 100 posts are scanned.
@@ -47,8 +31,8 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
### Getting posts is taking too long.
- You can press Ctrl+C to interrupt it and start downloading.
### How downloaded files' names are formatted?
- Images that are not belong to an album or self posts are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
### How are filenames formatted?
- Self posts and images that are not belong to an album are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
You can use *reddit id* to go to post's reddit page by going to link **reddit.com/[REDDIT ID]**
- An image in an imgur album is formatted as **`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`**
@@ -65,9 +49,23 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
them, there.
## Changes on *master*
### [06/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/210238d0865febcb57fbd9f0b0a7d3da9dbff384)
- Sending headers when requesting a file in order not to be rejected by server
### [04/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/426089d0f35212148caff0082708a87017757bde)
- Disabled printing post types to console
### [30/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/af294929510f884d92b25eaa855c29fc4fb6dcaa)
- Now opens web browser and goes to Imgur when prompts for Imgur credentials
### [26/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Improved verbose mode
- Minimalized the console output
- Added quit option for auto quitting the program after process finishes
### [25/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/1623722138bad80ae39ffcd5fb38baf80680deac)
- Added verbose mode
- Stylize the console output
- Stylized the console output
### [24/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7a68ff3efac9939f9574c2cef6184b92edb135f4)
- Added OP's name to file names (backwards compatible)
@@ -75,19 +73,19 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
- Improved exception handling
### [23/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7314e17125aa78fd4e6b28e26fda7ec7db7e0147)
- Split download() function
- Splited download() function
- Added erome support
- Remove exclude feature
- Bug fix
- Removed exclude feature
- Bug fixes
### [22/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/a67da461d2fcd70672effcb20c8179e3224091bb)
### [22/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/6e7463005051026ad64006a8580b0b5dc9536b8c)
- Put log files in a folder named "LOG_FILES"
- Fixed the bug that makes multireddit mode unusable
### [21/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/4a8c2377f9fb4d60ed7eeb8d50aaf9a26492462a)
- Added exclude mode
### [20/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/commit/7548a010198fb693841ca03654d2c9bdf5742139)
### [20/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/7548a010198fb693841ca03654d2c9bdf5742139)
- "0" input for no limit
- Fixed the bug that recognizes none image direct links as image links
@@ -97,7 +95,7 @@ It should redirect to a page which shows your **imgur_client_id** and **imgur_cl
- Fixed the bug that prevents downloading some gfycat URLs
### [13/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/9f831e1b784a770c82252e909462871401a05c11)
- Change config.json file's path to home directory
- Changed config.json file's path to home directory
### [12/07/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/50a77f6ba54c24f5647d5ea4e177400b71ff04a7)
- Added binaries for Windows and Linux

1
_config.yml Normal file
View File

@@ -0,0 +1 @@
theme: jekyll-theme-cayman

View File

@@ -5,12 +5,12 @@ See **[compiling from source](COMPILE_FROM_SOURCE.md)** page first unless you ar
***Use*** `.\bulk-downloader-for-reddit.exe` ***or*** `./bulk-downloader-for-reddit` ***if you are using the executable***.
```console
$ python script.py --help
usage: script.py [-h] [--directory DIRECTORY] [--link link] [--saved]
[--submitted] [--upvoted] [--log LOG FILE]
[--subreddit SUBREDDIT [SUBREDDIT ...]]
usage: script.py [-h] [--directory DIRECTORY] [--NoDownload] [--verbose]
[--quit] [--link link] [--saved] [--submitted] [--upvoted]
[--log LOG FILE] [--subreddit SUBREDDIT [SUBREDDIT ...]]
[--multireddit MULTIREDDIT] [--user redditor]
[--search query] [--sort SORT TYPE] [--limit Limit]
[--time TIME_LIMIT] [--NoDownload] [--verbose]
[--time TIME_LIMIT]
This program downloads media from reddit posts
@@ -19,6 +19,10 @@ optional arguments:
--directory DIRECTORY, -d DIRECTORY
Specifies the directory where posts will be downloaded
to
--NoDownload Just gets the posts and stores them in a file for
downloading later
--verbose, -v Verbose Mode
--quit, -q Auto quit afer the process finishes
--link link, -l link Get posts from link
--saved Triggers saved mode
--submitted Gets posts of --user
@@ -38,9 +42,6 @@ optional arguments:
--limit Limit default: unlimited
--time TIME_LIMIT Either hour, day, week, month, year or all. default:
all
--NoDownload Just gets the posts and store them in a file for
downloading later
--verbose, -v Verbose Mode
```
# Examples

View File

@@ -1,16 +1,14 @@
# Compiling from source code
## Requirements
### Python 3 Interpreter
Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option is mandatory.
Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
\* *Use Python 3.6.5 if you encounter an issue*
## Using terminal
### To open it...
- **On Windows 8/8.1/10**: Press the File tab on **Windows Explorer**, click on **Open Windows PowerShell** or **Open Windows Command Prompt** or look for *Command Prompt* or *PowerShell* in *Start Menu*.
- **On Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
- **On Windows 7**: Press **WindowsKey+R**, type **cmd** and hit Enter or look for *Command Prompt* or *PowerShell* in *Start Menu*.
- **On Linux**: Right-click in a folder and select **Open Terminal** or press **Ctrl+Alt+T** or look for **Terminal** in the programs.
- **On Linux**: Right-click in a folder and select **Open Terminal** or press **Ctrl+Alt+T**.
- **On MacOS**: Look for an app called **Terminal**.
@@ -39,4 +37,4 @@ python -m pip install -r requirements.txt
---
Now, you can go to [Using command-line arguments](COMMAND_LINE_ARGUMENTS.md)
Now, you can go to [Using command-line arguments](COMMAND_LINE_ARGUMENTS.md)

View File

@@ -10,10 +10,11 @@ import logging
import os
import sys
import time
import webbrowser
from io import StringIO
from pathlib import Path, PurePath
from src.downloader import Direct, Gfycat, Imgur, Self, Erome
from src.downloader import Direct, Erome, Gfycat, Imgur, Self
from src.errors import *
from src.parser import LinkDesigner
from src.searcher import getPosts
@@ -22,7 +23,7 @@ from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
__author__ = "Ali Parlakci"
__license__ = "GPL"
__version__ = "1.5.2"
__version__ = "1.6.1"
__maintainer__ = "Ali Parlakci"
__email__ = "parlakciali@gmail.com"
@@ -38,20 +39,34 @@ def getConfig(configFileName):
if "reddit_refresh_token" in content:
if content["reddit_refresh_token"] == "":
FILE.delete("reddit_refresh_token")
if not all(False if content.get(key,"") == "" else True for key in keys):
print(
"Go to this URL and fill the form: " \
"https://api.imgur.com/oauth2/addclient\n" \
"Enter the client id and client secret here:"
)
webbrowser.open("https://api.imgur.com/oauth2/addclient",new=2)
for key in keys:
try:
if content[key] == "":
raise KeyError
except KeyError:
print(key,": ")
FILE.add({key:input()})
FILE.add({key:input(" "+key+": ")})
return jsonFile(configFileName).read()
else:
FILE = jsonFile(configFileName)
configDictionary = {}
print(
"Go to this URL and fill the form: " \
"https://api.imgur.com/oauth2/addclient\n" \
"Enter the client id and client secret here:"
)
webbrowser.open("https://api.imgur.com/oauth2/addclient",new=2)
for key in keys:
configDictionary[key] = input(key + ": ")
configDictionary[key] = input(" "+key+": ")
FILE.add(configDictionary)
return FILE.read()
@@ -66,6 +81,22 @@ def parseArguments(arguments=[]):
help="Specifies the directory where posts will be " \
"downloaded to",
metavar="DIRECTORY")
parser.add_argument("--NoDownload",
help="Just gets the posts and stores them in a file" \
" for downloading later",
action="store_true",
default=False)
parser.add_argument("--verbose","-v",
help="Verbose Mode",
action="store_true",
default=False)
parser.add_argument("--quit","-q",
help="Auto quit afer the process finishes",
action="store_true",
default=False)
parser.add_argument("--link","-l",
help="Get posts from link",
@@ -137,18 +168,6 @@ def parseArguments(arguments=[]):
choices=["all","hour","day","week","month","year"],
metavar="TIME_LIMIT",
type=str)
parser.add_argument("--NoDownload",
help="Just gets the posts and store them in a file" \
" for downloading later",
action="store_true",
default=False)
parser.add_argument("--verbose","-v",
help="Verbose Mode",
action="store_true",
default=False)
if arguments == []:
return parser.parse_args()
@@ -486,15 +505,19 @@ def downloadPost(SUBMISSION):
+ " Minutes " \
+ str(int(IMGUR_RESET_TIME%60)) \
+ " Seconds")
if credit['ClientRemaining'] < 25 or credit['UserRemaining'] < 25:
print(
"==> Client: {} - User: {} - Reset {}".format(
credit['ClientRemaining'],
credit['UserRemaining'],
USER_RESET
),end=""
)
printCredit = {"noPrint":False}
else:
printCredit = {"noPrint":True}
print(
"==> Client: {} - User: {} - Reset {}\n".format(
credit['ClientRemaining'],
credit['UserRemaining'],
USER_RESET
),end="",**printCredit
)
if not (credit['UserRemaining'] == 0 or \
credit['ClientRemaining'] == 0):
@@ -535,10 +558,9 @@ def download(submissions):
FAILED_FILE = createLogFile("FAILED")
for i in range(subsLenght):
print(
f"\n({i+1}/{subsLenght}) ({submissions[i]['postType'].upper()}) " \
f"(r/{submissions[i]['postSubreddit']})",end=""
)
print(f"\n({i+1}/{subsLenght}) r/{submissions[i]['postSubreddit']}",
end="")
print(f" {submissions[i]['postType'].upper()}",end="",noPrint=True)
if isPostExists(submissions[i]):
print("\nIt already exists")
@@ -620,7 +642,7 @@ def main():
else:
GLOBAL.directory = Path(input("download directory: "))
print("\n"," ".join(sys.argv),"\n")
print("\n"," ".join(sys.argv),"\n",noPrint=True)
try:
checkConflicts()
@@ -692,4 +714,4 @@ if __name__ == "__main__":
exc_info=full_exc_info(sys.exc_info()))
print(log_stream.getvalue())
input("\nPress enter to quit\n")
if not GLOBAL.arguments.quit: input("\nPress enter to quit\n")

View File

@@ -54,6 +54,21 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
As too long file names seem not working.
"""
headers = [
("User-Agent", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 " \
"(KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11"),
("Accept", "text/html,application/xhtml+xml,application/xml;" \
"q=0.9,*/*;q=0.8"),
("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3"),
("Accept-Encoding", "none"),
("Accept-Language", "en-US,en;q=0.8"),
("Connection", "keep-alive")
]
opener = urllib.request.build_opener()
opener.addheaders = headers
urllib.request.install_opener(opener)
if not (os.path.isfile(fileDir)):
for i in range(3):
try:

View File

@@ -126,8 +126,6 @@ def getPosts(args):
if args["user"] == "me":
args["user"] = str(reddit.user.me())
# print("\nGETTING POSTS\n.\n.\n.\n")
if not "search" in args:
if args["sort"] == "top" or args["sort"] == "controversial":
keyword_params = {
@@ -159,7 +157,7 @@ def getPosts(args):
sort=args["sort"],
subreddit=args["subreddit"],
time=args["time"]
).upper()
).upper(),noPrint=True
)
return redditSearcher(
reddit.subreddit(args["subreddit"]).search(
@@ -187,7 +185,7 @@ def getPosts(args):
"saved posts\nuser:{username}\nlimit={limit}\n".format(
username=reddit.user.me(),
limit=args["limit"]
).upper()
).upper(),noPrint=True
)
return redditSearcher(reddit.user.me().saved(limit=args["limit"]))
@@ -202,7 +200,7 @@ def getPosts(args):
sort=args["sort"],
subreddit=args["subreddit"],
time=args["time"]
).upper()
).upper(),noPrint=True
)
return redditSearcher(
getattr(reddit.front,args["sort"]) (**keyword_params)
@@ -216,7 +214,7 @@ def getPosts(args):
sort=args["sort"],
subreddit=args["subreddit"],
time=args["time"]
).upper()
).upper(),noPrint=True
)
return redditSearcher(
getattr(
@@ -234,7 +232,7 @@ def getPosts(args):
sort=args["sort"],
multireddit=args["multireddit"],
time=args["time"]
).upper()
).upper(),noPrint=True
)
try:
return redditSearcher(
@@ -255,7 +253,7 @@ def getPosts(args):
sort=args["sort"],
user=args["user"],
time=args["time"]
).upper()
).upper(),noPrint=True
)
return redditSearcher(
getattr(
@@ -268,7 +266,7 @@ def getPosts(args):
"upvoted posts of {user}\nlimit: {limit}\n".format(
user=args["user"],
limit=args["limit"]
).upper()
).upper(),noPrint=True
)
try:
return redditSearcher(
@@ -278,7 +276,7 @@ def getPosts(args):
raise InsufficientPermission
elif "post" in args:
print("post: {post}\n".format(post=args["post"]).upper())
print("post: {post}\n".format(post=args["post"]).upper(),noPrint=True)
return redditSearcher(
reddit.submission(url=args["post"]),SINGLE_POST=True
)
@@ -307,7 +305,8 @@ def redditSearcher(posts,SINGLE_POST=False):
allPosts = {}
print("GETTING POSTS")
print("\nGETTING POSTS")
if GLOBAL.arguments.verbose: print("\n")
postsFile = createLogFile("POSTS")
if SINGLE_POST:
@@ -344,7 +343,7 @@ def redditSearcher(posts,SINGLE_POST=False):
sys.stdout.flush()
if subCount % 1000 == 0:
sys.stdout.write("\n")
sys.stdout.write("\n"+" "*14)
sys.stdout.flush()
try:
@@ -368,17 +367,22 @@ def redditSearcher(posts,SINGLE_POST=False):
allPosts[subCount] = [details]
except KeyboardInterrupt:
print("\nKeyboardInterrupt",end="")
print("\nKeyboardInterrupt",noPrint=True)
postsFile.add(allPosts)
if not len(subList) == 0:
print(
f"\n\nTotal of {len(subList)} submissions found!\n"\
f"{gfycatCount} GFYCATs, {imgurCount} IMGURs, " \
f"{eromeCount} EROMEs, {directCount} DIRECTs " \
f"and {selfCount} SELF POSTS"
)
if not len(subList) == 0:
if GLOBAL.arguments.NoDownload or GLOBAL.arguments.verbose:
print(
f"\n\nTotal of {len(subList)} submissions found!"
)
print(
f"{gfycatCount} GFYCATs, {imgurCount} IMGURs, " \
f"{eromeCount} EROMEs, {directCount} DIRECTs " \
f"and {selfCount} SELF POSTS",noPrint=True
)
else:
print()
return subList
else:
raise NoMatchingSubmissionFound

View File

@@ -90,7 +90,7 @@ def createLogFile(TITLE):
return FILE
def printToFile(*args, **kwargs):
def printToFile(*args, noPrint=False,**kwargs):
"""Print to both CONSOLE and
CONSOLE LOG file in a folder time stampt in the name
"""
@@ -98,7 +98,12 @@ def printToFile(*args, **kwargs):
TIME = str(time.strftime("%d-%m-%Y_%H-%M-%S",
time.localtime(GLOBAL.RUN_TIME)))
folderDirectory = GLOBAL.directory / "LOG_FILES" / TIME
print(*args,**kwargs)
if not noPrint or \
GLOBAL.arguments.verbose or \
"file" in kwargs:
print(*args,**kwargs)
if not path.exists(folderDirectory):
makedirs(folderDirectory)