21 Commits

Author SHA1 Message Date
Ali Parlakci
82dcd2f63d Bug fix 2019-01-27 17:05:31 +03:00
Ali Parlakci
08de21a364 Updated Python3 version 2019-01-27 16:32:43 +03:00
Ali Parlakci
af7d3d9151 Moved FAQ 2019-01-27 16:32:00 +03:00
Ali Parlakci
280147282b 27 jan update 2019-01-27 16:06:31 +03:00
Ali Parlakci
b7baf07fb5 Added instructions 2019-01-27 15:59:24 +03:00
Ali Parlakci
aece2273fb Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-08-28 16:28:29 +03:00
Ali Parlakci
f807efe4d5 Ignore space at the end of directory 2018-08-28 16:27:29 +03:00
Ali Parlakci
743d887927 Ignore space at the end of directory 2018-08-28 16:24:14 +03:00
Ali Parlakci
da5492858c Add bs4 2018-08-28 16:15:22 +03:00
Ali Parlakci
cebfc713d2 Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit 2018-08-28 16:12:01 +03:00
Ali Parlakci
f522154214 Update version 2018-08-28 16:11:48 +03:00
Ali Parlakçı
27cd3ee991 Changed getting gfycat links' algorithm 2018-08-28 16:10:15 +03:00
Ali Parlakci
29873331e6 Typo fix 2018-08-23 16:41:07 +03:00
Ali Parlakci
8a3dcd68a3 Update version 2018-08-23 12:16:31 +03:00
Ali Parlakci
ac323f2abe Bug fix 2018-08-23 12:09:56 +03:00
Ali Parlakci
32d26fa956 Print out github link at start 2018-08-20 15:13:42 +03:00
Ali Parlakci
137481cf3e Print out program info 2018-08-18 14:51:20 +03:00
Ali Parlakci
9b63c55d3e Print out version info before starting 2018-08-17 21:25:01 +03:00
Ali Parlakci
3a6954c7d3 Update version 2018-08-16 19:55:45 +03:00
Ali Parlakci
9a59da0c5f Update changelog 2018-08-16 19:53:33 +03:00
Ali Parlakci
d56efed1c6 Fix imgur download malfunction caused by headers 2018-08-16 19:51:56 +03:00
7 changed files with 83 additions and 60 deletions

View File

@@ -1,7 +1,7 @@
# Bulk Downloader for Reddit
Downloads media from reddit posts.
## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
## What it can do
- Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
@@ -13,8 +13,8 @@ Downloads media from reddit posts.
- Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
- Logs failed ones in a file to so that you can try to download them later
## **[Compiling it from source code](docs/COMPILE_FROM_SOURCE.md)**
*\* MacOS users have to use this option.*
## **Compiling it from source code**
MacOS users have to use this option. See *[here](docs/COMPILE_FROM_SOURCE.md)*
## Additional options
Script also accepts additional options via command-line arguments. Get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
@@ -24,6 +24,30 @@ You need to create an imgur developer app in order API to work. Go to https://ap
It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
## [FAQ](docs/FAQ.md)
## FAQ
### How can I change my credentials?
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit
them, there.
## [Changes on *master*](docs/CHANGELOG.md)
### What do the dots resemble when getting posts?
- Each dot means that 100 posts are scanned.
### Getting posts takes too long.
- You can press *Ctrl+C* to interrupt it and start downloading.
### How are the filenames formatted?
- **Self posts** and **images** that do not belong to an album and **album folders** are formatted as:
`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`
You can use *reddit id* to go to post's reddit page by going to link reddit.com/[REDDIT ID]
- An **image in an album** is formatted as:
`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`
Similarly, you can use *imgur id* to go to image's imgur page by going to link imgur.com/[IMGUR ID].
### How do I open self post files?
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
## [See the changes on *master* here](docs/CHANGELOG.md)

View File

@@ -1,4 +1,14 @@
# Changes on *master*
## [27/01/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/b7baf07fb5998368d87e3c4c36aed40daf820609)
- Clarified the instructions
## [28/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Adjusted algorith used for extracting gfycat links because of gfycat's design change
- Ignore space at the end of the given directory
## [16/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
- Fix the bug that prevents downloading imgur videos
## [15/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/adccd8f3ba03ad124d58643d78dab287a4123a6f)
- Prints out the title of posts' that are already downloaded

View File

@@ -1,9 +1,9 @@
# Compiling from source code
## Requirements
### Python 3 Interpreter
Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
\* *Use Python 3.6.5 if you encounter an issue*
- This program is designed to work best on **Python 3.6.5** and this version of Python 3 is suggested. See if it is already installed, [here](#finding-the-correct-keyword-for-python).
- If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.
## Using terminal
### To open it...
- **On Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**

View File

@@ -1,23 +0,0 @@
# FAQ
## What do the dots resemble when getting posts?
- Each dot means that 100 posts are scanned.
## Getting posts is taking too long.
- You can press Ctrl+C to interrupt it and start downloading.
## How are filenames formatted?
- Self posts and images that are not belong to an album are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
You can use *reddit id* to go to post's reddit page by going to link **reddit.com/[REDDIT ID]**
- An image in an imgur album is formatted as **`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`**
Similarly, you can use *imgur id* to go to image's imgur page by going to link **imgur.com/[IMGUR ID]**.
## How do I open self post files?
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).
However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
## How can I change my credentials?
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit
them, there.

View File

@@ -1,3 +1,4 @@
bs4
requests
praw
imgurpython

View File

@@ -23,7 +23,7 @@ from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,
__author__ = "Ali Parlakci"
__license__ = "GPL"
__version__ = "1.6.2"
__version__ = "1.6.4.1"
__maintainer__ = "Ali Parlakci"
__email__ = "parlakciali@gmail.com"
@@ -265,18 +265,22 @@ class PromptUser:
if programMode == "subreddit":
subredditInput = input("subreddit (enter frontpage for frontpage): ")
subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
" use plus to seperate multi subreddits:" \
" pics+funny+me_irl etc.)\n\n" \
"subreddit: ")
GLOBAL.arguments.subreddit = subredditInput
while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
subredditInput = input("subreddit: ")
GLOBAL.arguments.subreddit += "+" + subredditInput
# while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
# subredditInput = input("subreddit: ")
# GLOBAL.arguments.subreddit += "+" + subredditInput
if " " in GLOBAL.arguments.subreddit:
GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit.split())
# DELETE THE PLUS (+) AT THE END
if not subredditInput.lower() == "frontpage":
if not subredditInput.lower() == "frontpage" \
and GLOBAL.arguments.subreddit[-1] == "+":
GLOBAL.arguments.subreddit = GLOBAL.arguments.subreddit[:-1]
print("\nselect sort type:")
@@ -297,7 +301,7 @@ class PromptUser:
GLOBAL.arguments.time = "all"
elif programMode == "multireddit":
GLOBAL.arguments.user = input("\nredditor: ")
GLOBAL.arguments.user = input("\nmultireddit owner: ")
GLOBAL.arguments.multireddit = input("\nmultireddit: ")
print("\nselect sort type:")
@@ -569,7 +573,9 @@ def download(submissions):
print(f" {submissions[i]['postType'].upper()}",end="",noPrint=True)
if isPostExists(submissions[i]):
print(f"\n{nameCorrector(submissions[i]['postTitle'])}")
print(f"\n" \
f"{submissions[i]['postSubmitter']}_"
f"{nameCorrector(submissions[i]['postTitle'])}")
print("It already exists")
duplicates += 1
downloadedCount -= 1
@@ -633,23 +639,33 @@ def download(submissions):
downloadedCount -= 1
if duplicates:
print("\n There was {} duplicates".format(duplicates))
print(f"\nThere {'were' if duplicates > 1 else 'was'} " \
f"{duplicates} duplicate{'s' if duplicates > 1 else ''}")
if downloadedCount == 0:
print(" Nothing downloaded :(")
print("Nothing downloaded :(")
else:
print(" Total of {} links downloaded!".format(downloadedCount))
print(f"Total of {downloadedCount} " \
f"link{'s' if downloadedCount > 1 else ''} downloaded!")
def main():
VanillaPrint(
f"\nBulk Downloader for Reddit v{__version__}\n" \
f"Written by Ali PARLAKCI parlakciali@gmail.com\n\n" \
f"https://github.com/aliparlakci/bulk-downloader-for-reddit/"
)
GLOBAL.arguments = parseArguments()
if GLOBAL.arguments.directory is not None:
GLOBAL.directory = Path(GLOBAL.arguments.directory)
GLOBAL.directory = Path(GLOBAL.arguments.directory.strip())
else:
GLOBAL.directory = Path(input("download directory: "))
GLOBAL.directory = Path(input("\ndownload directory: ").strip())
print("\n"," ".join(sys.argv),"\n",noPrint=True)
print(f"Bulk Downloader for Reddit v{__version__}\n",noPrint=True
)
try:
checkConflicts()

View File

@@ -1,13 +1,15 @@
import io
import json
import os
import sys
import urllib.request
from html.parser import HTMLParser
from multiprocessing import Queue
from pathlib import Path
from urllib.error import HTTPError
import imgurpython
from multiprocessing import Queue
from bs4 import BeautifulSoup
from src.errors import (AlbumNotDownloadedCompletely, FileAlreadyExistsError,
FileNameTooLong, ImgurLoginError,
@@ -66,7 +68,8 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
]
opener = urllib.request.build_opener()
opener.addheaders = headers
if not "imgur" in imageURL:
opener.addheaders = headers
urllib.request.install_opener(opener)
if not (os.path.isfile(fileDir)):
@@ -441,24 +444,16 @@ class Gfycat:
url = "https://gfycat.com/" + url.split('/')[-1]
pageSource = (urllib.request.urlopen(url).read().decode().split('\n'))
pageSource = (urllib.request.urlopen(url).read().decode())
theLine = pageSource[lineNumber]
lenght = len(query)
link = []
soup = BeautifulSoup(pageSource, "html.parser")
attributes = {"data-react-helmet":"true","type":"application/ld+json"}
content = soup.find("script",attrs=attributes)
for i in range(len(theLine)):
if theLine[i:i+lenght] == query:
cursor = (i+lenght)+1
while not theLine[cursor] == '"':
link.append(theLine[cursor])
cursor += 1
break
if "".join(link) == "":
if content is None:
raise NotADownloadableLinkError("Could not read the page source")
return "".join(link)
return json.loads(content.text)["video"]["contentUrl"]
class Direct:
def __init__(self,directory,POST):