mirror of
https://github.com/KevinMidboe/bulk-downloader-for-reddit.git
synced 2026-01-10 19:25:41 +00:00
Compare commits
21 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
82dcd2f63d | ||
|
|
08de21a364 | ||
|
|
af7d3d9151 | ||
|
|
280147282b | ||
|
|
b7baf07fb5 | ||
|
|
aece2273fb | ||
|
|
f807efe4d5 | ||
|
|
743d887927 | ||
|
|
da5492858c | ||
|
|
cebfc713d2 | ||
|
|
f522154214 | ||
|
|
27cd3ee991 | ||
|
|
29873331e6 | ||
|
|
8a3dcd68a3 | ||
|
|
ac323f2abe | ||
|
|
32d26fa956 | ||
|
|
137481cf3e | ||
|
|
9b63c55d3e | ||
|
|
3a6954c7d3 | ||
|
|
9a59da0c5f | ||
|
|
d56efed1c6 |
34
README.md
34
README.md
@@ -1,7 +1,7 @@
|
||||
# Bulk Downloader for Reddit
|
||||
Downloads media from reddit posts.
|
||||
|
||||
## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
|
||||
## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
|
||||
|
||||
## What it can do
|
||||
- Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
|
||||
@@ -13,8 +13,8 @@ Downloads media from reddit posts.
|
||||
- Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
|
||||
- Logs failed ones in a file to so that you can try to download them later
|
||||
|
||||
## **[Compiling it from source code](docs/COMPILE_FROM_SOURCE.md)**
|
||||
*\* MacOS users have to use this option.*
|
||||
## **Compiling it from source code**
|
||||
MacOS users have to use this option. See *[here](docs/COMPILE_FROM_SOURCE.md)*
|
||||
|
||||
## Additional options
|
||||
Script also accepts additional options via command-line arguments. Get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
|
||||
@@ -24,6 +24,30 @@ You need to create an imgur developer app in order API to work. Go to https://ap
|
||||
|
||||
It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
|
||||
|
||||
## [FAQ](docs/FAQ.md)
|
||||
## FAQ
|
||||
### How can I change my credentials?
|
||||
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit
|
||||
them, there.
|
||||
|
||||
## [Changes on *master*](docs/CHANGELOG.md)
|
||||
### What do the dots resemble when getting posts?
|
||||
- Each dot means that 100 posts are scanned.
|
||||
|
||||
### Getting posts takes too long.
|
||||
- You can press *Ctrl+C* to interrupt it and start downloading.
|
||||
|
||||
### How are the filenames formatted?
|
||||
- **Self posts** and **images** that do not belong to an album and **album folders** are formatted as:
|
||||
`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`
|
||||
You can use *reddit id* to go to post's reddit page by going to link reddit.com/[REDDIT ID]
|
||||
|
||||
- An **image in an album** is formatted as:
|
||||
`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`
|
||||
Similarly, you can use *imgur id* to go to image's imgur page by going to link imgur.com/[IMGUR ID].
|
||||
|
||||
### How do I open self post files?
|
||||
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
|
||||
However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).
|
||||
|
||||
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
|
||||
|
||||
## [See the changes on *master* here](docs/CHANGELOG.md)
|
||||
@@ -1,4 +1,14 @@
|
||||
# Changes on *master*
|
||||
## [27/01/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/b7baf07fb5998368d87e3c4c36aed40daf820609)
|
||||
- Clarified the instructions
|
||||
|
||||
## [28/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
|
||||
- Adjusted algorith used for extracting gfycat links because of gfycat's design change
|
||||
- Ignore space at the end of the given directory
|
||||
|
||||
## [16/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
|
||||
- Fix the bug that prevents downloading imgur videos
|
||||
|
||||
## [15/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/adccd8f3ba03ad124d58643d78dab287a4123a6f)
|
||||
- Prints out the title of posts' that are already downloaded
|
||||
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
# Compiling from source code
|
||||
## Requirements
|
||||
### Python 3 Interpreter
|
||||
Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
|
||||
|
||||
\* *Use Python 3.6.5 if you encounter an issue*
|
||||
- This program is designed to work best on **Python 3.6.5** and this version of Python 3 is suggested. See if it is already installed, [here](#finding-the-correct-keyword-for-python).
|
||||
- If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
|
||||
|
||||
## Using terminal
|
||||
### To open it...
|
||||
- **On Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
|
||||
|
||||
23
docs/FAQ.md
23
docs/FAQ.md
@@ -1,23 +0,0 @@
|
||||
# FAQ
|
||||
## What do the dots resemble when getting posts?
|
||||
- Each dot means that 100 posts are scanned.
|
||||
|
||||
## Getting posts is taking too long.
|
||||
- You can press Ctrl+C to interrupt it and start downloading.
|
||||
|
||||
## How are filenames formatted?
|
||||
- Self posts and images that are not belong to an album are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
|
||||
You can use *reddit id* to go to post's reddit page by going to link **reddit.com/[REDDIT ID]**
|
||||
|
||||
- An image in an imgur album is formatted as **`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`**
|
||||
Similarly, you can use *imgur id* to go to image's imgur page by going to link **imgur.com/[IMGUR ID]**.
|
||||
|
||||
## How do I open self post files?
|
||||
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
|
||||
However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).
|
||||
|
||||
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
|
||||
|
||||
## How can I change my credentials?
|
||||
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit
|
||||
them, there.
|
||||
@@ -1,3 +1,4 @@
|
||||
bs4
|
||||
requests
|
||||
praw
|
||||
imgurpython
|
||||
42
script.py
42
script.py
@@ -23,7 +23,7 @@ from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
|
||||
|
||||
__author__ = "Ali Parlakci"
|
||||
__license__ = "GPL"
|
||||
__version__ = "1.6.2"
|
||||
__version__ = "1.6.4.1"
|
||||
__maintainer__ = "Ali Parlakci"
|
||||
__email__ = "parlakciali@gmail.com"
|
||||
|
||||
@@ -265,18 +265,22 @@ class PromptUser:
|
||||
|
||||
if programMode == "subreddit":
|
||||
|
||||
subredditInput = input("subreddit (enter frontpage for frontpage): ")
|
||||
subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
|
||||
" use plus to seperate multi subreddits:" \
|
||||
" pics+funny+me_irl etc.)\n\n" \
|
||||
"subreddit: ")
|
||||
GLOBAL.arguments.subreddit = subredditInput
|
||||
|
||||
while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
|
||||
subredditInput = input("subreddit: ")
|
||||
GLOBAL.arguments.subreddit += "+" + subredditInput
|
||||
# while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
|
||||
# subredditInput = input("subreddit: ")
|
||||
# GLOBAL.arguments.subreddit += "+" + subredditInput
|
||||
|
||||
if " " in GLOBAL.arguments.subreddit:
|
||||
GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit.split())
|
||||
|
||||
# DELETE THE PLUS (+) AT THE END
|
||||
if not subredditInput.lower() == "frontpage":
|
||||
if not subredditInput.lower() == "frontpage" \
|
||||
and GLOBAL.arguments.subreddit[-1] == "+":
|
||||
GLOBAL.arguments.subreddit = GLOBAL.arguments.subreddit[:-1]
|
||||
|
||||
print("\nselect sort type:")
|
||||
@@ -297,7 +301,7 @@ class PromptUser:
|
||||
GLOBAL.arguments.time = "all"
|
||||
|
||||
elif programMode == "multireddit":
|
||||
GLOBAL.arguments.user = input("\nredditor: ")
|
||||
GLOBAL.arguments.user = input("\nmultireddit owner: ")
|
||||
GLOBAL.arguments.multireddit = input("\nmultireddit: ")
|
||||
|
||||
print("\nselect sort type:")
|
||||
@@ -569,7 +573,9 @@ def download(submissions):
|
||||
print(f" – {submissions[i]['postType'].upper()}",end="",noPrint=True)
|
||||
|
||||
if isPostExists(submissions[i]):
|
||||
print(f"\n{nameCorrector(submissions[i]['postTitle'])}")
|
||||
print(f"\n" \
|
||||
f"{submissions[i]['postSubmitter']}_"
|
||||
f"{nameCorrector(submissions[i]['postTitle'])}")
|
||||
print("It already exists")
|
||||
duplicates += 1
|
||||
downloadedCount -= 1
|
||||
@@ -633,23 +639,33 @@ def download(submissions):
|
||||
downloadedCount -= 1
|
||||
|
||||
if duplicates:
|
||||
print("\n There was {} duplicates".format(duplicates))
|
||||
print(f"\nThere {'were' if duplicates > 1 else 'was'} " \
|
||||
f"{duplicates} duplicate{'s' if duplicates > 1 else ''}")
|
||||
|
||||
if downloadedCount == 0:
|
||||
print(" Nothing downloaded :(")
|
||||
print("Nothing downloaded :(")
|
||||
|
||||
else:
|
||||
print(" Total of {} links downloaded!".format(downloadedCount))
|
||||
print(f"Total of {downloadedCount} " \
|
||||
f"link{'s' if downloadedCount > 1 else ''} downloaded!")
|
||||
|
||||
def main():
|
||||
|
||||
VanillaPrint(
|
||||
f"\nBulk Downloader for Reddit v{__version__}\n" \
|
||||
f"Written by Ali PARLAKCI – parlakciali@gmail.com\n\n" \
|
||||
f"https://github.com/aliparlakci/bulk-downloader-for-reddit/"
|
||||
)
|
||||
GLOBAL.arguments = parseArguments()
|
||||
|
||||
if GLOBAL.arguments.directory is not None:
|
||||
GLOBAL.directory = Path(GLOBAL.arguments.directory)
|
||||
GLOBAL.directory = Path(GLOBAL.arguments.directory.strip())
|
||||
else:
|
||||
GLOBAL.directory = Path(input("download directory: "))
|
||||
GLOBAL.directory = Path(input("\ndownload directory: ").strip())
|
||||
|
||||
print("\n"," ".join(sys.argv),"\n",noPrint=True)
|
||||
print(f"Bulk Downloader for Reddit v{__version__}\n",noPrint=True
|
||||
)
|
||||
|
||||
try:
|
||||
checkConflicts()
|
||||
|
||||
@@ -1,13 +1,15 @@
|
||||
import io
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import urllib.request
|
||||
from html.parser import HTMLParser
|
||||
from multiprocessing import Queue
|
||||
from pathlib import Path
|
||||
from urllib.error import HTTPError
|
||||
|
||||
import imgurpython
|
||||
from multiprocessing import Queue
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from src.errors import (AlbumNotDownloadedCompletely, FileAlreadyExistsError,
|
||||
FileNameTooLong, ImgurLoginError,
|
||||
@@ -66,7 +68,8 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
|
||||
]
|
||||
|
||||
opener = urllib.request.build_opener()
|
||||
opener.addheaders = headers
|
||||
if not "imgur" in imageURL:
|
||||
opener.addheaders = headers
|
||||
urllib.request.install_opener(opener)
|
||||
|
||||
if not (os.path.isfile(fileDir)):
|
||||
@@ -441,24 +444,16 @@ class Gfycat:
|
||||
|
||||
url = "https://gfycat.com/" + url.split('/')[-1]
|
||||
|
||||
pageSource = (urllib.request.urlopen(url).read().decode().split('\n'))
|
||||
pageSource = (urllib.request.urlopen(url).read().decode())
|
||||
|
||||
theLine = pageSource[lineNumber]
|
||||
lenght = len(query)
|
||||
link = []
|
||||
soup = BeautifulSoup(pageSource, "html.parser")
|
||||
attributes = {"data-react-helmet":"true","type":"application/ld+json"}
|
||||
content = soup.find("script",attrs=attributes)
|
||||
|
||||
for i in range(len(theLine)):
|
||||
if theLine[i:i+lenght] == query:
|
||||
cursor = (i+lenght)+1
|
||||
while not theLine[cursor] == '"':
|
||||
link.append(theLine[cursor])
|
||||
cursor += 1
|
||||
break
|
||||
|
||||
if "".join(link) == "":
|
||||
if content is None:
|
||||
raise NotADownloadableLinkError("Could not read the page source")
|
||||
|
||||
return "".join(link)
|
||||
return json.loads(content.text)["video"]["contentUrl"]
|
||||
|
||||
class Direct:
|
||||
def __init__(self,directory,POST):
|
||||
|
||||
Reference in New Issue
Block a user