Bug fix

Updated Python3 version
Moved FAQ
2026-01-10 19:25:41 +00:00 · 2019-01-27 17:05:31 +03:00 · 2019-01-27 16:32:43 +03:00 · 2019-01-27 16:32:00 +03:00 · 2019-01-27 16:06:31 +03:00 · 2019-01-27 15:59:24 +03:00
7 changed files with 83 additions and 60 deletions
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 # Bulk Downloader for Reddit
 Downloads media from reddit posts.

-## [Download the latest release](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)
+## [Download the latest release here](https://github.com/aliparlakci/bulk-downloader-for-reddit/releases/latest)

 ## What it can do
 - Can get posts from: frontpage, subreddits, multireddits, redditor's submissions, upvoted and saved posts; search results or just plain reddit links
@@ -13,8 +13,8 @@ Downloads media from reddit posts.
 - Saves a reusable copy of posts' details that are found so that they can be re-downloaded again
 - Logs failed ones in a file to so that you can try to download them later

-## **[Compiling it from source code](docs/COMPILE_FROM_SOURCE.md)**
-*\* MacOS users have to use this option.*
+## **Compiling it from source code**
+MacOS users have to use this option. See *[here](docs/COMPILE_FROM_SOURCE.md)*

 ## Additional options
 Script also accepts additional options via command-line arguments. Get further information from **[`--help`](docs/COMMAND_LINE_ARGUMENTS.md)**
@@ -24,6 +24,30 @@ You need to create an imgur developer app in order API to work. Go to https://ap
  
 It should redirect you to a page where it shows your **imgur_client_id** and **imgur_client_secret**.
  
-## [FAQ](docs/FAQ.md)
+## FAQ
+### How can I change my credentials?
+- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit 
+  them, there.

-## [Changes on *master*](docs/CHANGELOG.md)
+### What do the dots resemble when getting posts?
+- Each dot means that 100 posts are scanned. 
+  
+### Getting posts takes too long.
+- You can press *Ctrl+C* to interrupt it and start downloading.
+  
+### How are the filenames formatted?
+- **Self posts** and **images** that do not belong to an album and **album folders** are formatted as:  
+  `[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`  
+  You can use *reddit id* to go to post's reddit page by going to link reddit.com/[REDDIT ID]
+  
+- An **image in an album** is formatted as:  
+  `[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`  
+  Similarly, you can use *imgur id* to go to image's imgur page by going to link imgur.com/[IMGUR ID].
+
+### How do I open self post files?
+- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
+  However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).  
+
+  However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
+
+## [See the changes on *master* here](docs/CHANGELOG.md)
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -1,4 +1,14 @@
 # Changes on *master*
+## [27/01/2019](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/b7baf07fb5998368d87e3c4c36aed40daf820609)
+- Clarified the instructions
+
+## [28/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
+- Adjusted algorith used for extracting gfycat links because of gfycat's design change
+- Ignore space at the end of the given directory
+
+## [16/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/d56efed1c6833a66322d9158523b89d0ce57f5de)
+- Fix the bug that prevents downloading imgur videos
+
 ## [15/08/2018](https://github.com/aliparlakci/bulk-downloader-for-reddit/tree/adccd8f3ba03ad124d58643d78dab287a4123a6f)
 - Prints out the title of posts' that are already downloaded

--- a/docs/COMPILE_FROM_SOURCE.md
+++ b/docs/COMPILE_FROM_SOURCE.md
@@ -1,9 +1,9 @@
 # Compiling from source code
 ## Requirements
 ### Python 3 Interpreter
-Latest* version of **Python 3** is needed. See if it is already installed [here](#finding-the-correct-keyword-for-python). If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.   
-  
-\* *Use Python 3.6.5 if you encounter an issue*
+- This program is designed to work best on **Python 3.6.5** and this version of Python 3 is suggested. See if it is already installed, [here](#finding-the-correct-keyword-for-python).  
+- If not, download the matching release for your platform [here](https://www.python.org/downloads/) and install it. If you are a *Windows* user, selecting **Add Python 3 to PATH** option when installing the software is mandatory.   
+
 ## Using terminal
 ### To open it...
 -  **On Windows**: Press **Shift+Right Click**, select **Open Powershell window here** or **Open Command Prompt window here**
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -1,23 +0,0 @@
-# FAQ
-## What do the dots resemble when getting posts?
- Each dot means that 100 posts are scanned. 
-  
-## Getting posts is taking too long.
- You can press Ctrl+C to interrupt it and start downloading.
-  
-## How are filenames formatted?
- Self posts and images that are not belong to an album are formatted as **`[SUBMITTER NAME]_[POST TITLE]_[REDDIT ID]`**.
-  You can use *reddit id* to go to post's reddit page by going to link **reddit.com/[REDDIT ID]**
-  
- An image in an imgur album is formatted as **`[ITEM NUMBER]_[IMAGE TITLE]_[IMGUR ID]`**
-  Similarly, you can use *imgur id* to go to image's imgur page by going to link **imgur.com/[IMGUR ID]**.
-
-## How do I open self post files?
- Self posts are held at reddit as styled with markdown. So, the script downloads them as they are in order not to lose their stylings.
-  However, there is a [great Chrome extension](https://chrome.google.com/webstore/detail/markdown-viewer/ckkdlimhmcjmikdlpkmbgfkaikojcbjk) for viewing Markdown files with its styling. Install it and open the files with [Chrome](https://www.google.com/intl/tr/chrome/).  
-
-  However, they are basically text files. You can also view them with any text editor such as Notepad on Windows, gedit on Linux or Text Editor on MacOS
-
-## How can I change my credentials?
- All of the user data is held in **config.json** file which is in a folder named "Bulk Downloader for Reddit" in your **Home** directory. You can edit 
-  them, there.
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,4 @@
+bs4
 requests
 praw
 imgurpython
--- a/script.py
+++ b/script.py
@@ -23,7 +23,7 @@ from src.tools import (GLOBAL, createLogFile, jsonFile, nameCorrector,

 __author__ = "Ali Parlakci"
 __license__ = "GPL"
-__version__ = "1.6.2"
+__version__ = "1.6.4.1"
 __maintainer__ = "Ali Parlakci"
 __email__ = "parlakciali@gmail.com"

@@ -265,18 +265,22 @@ class PromptUser:

        if programMode == "subreddit":

-            subredditInput = input("subreddit (enter frontpage for frontpage): ")
+            subredditInput = input("(type frontpage for all subscribed subreddits,\n" \
+                                   " use plus to seperate multi subreddits:" \
+                                   " pics+funny+me_irl etc.)\n\n" \
+                                   "subreddit: ")
            GLOBAL.arguments.subreddit = subredditInput

-            while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
-                subredditInput = input("subreddit: ")
-                GLOBAL.arguments.subreddit += "+" + subredditInput
+            # while not (subredditInput == "" or subredditInput.lower() == "frontpage"):
+            #     subredditInput = input("subreddit: ")
+            #     GLOBAL.arguments.subreddit += "+" + subredditInput

            if " " in GLOBAL.arguments.subreddit:
                GLOBAL.arguments.subreddit = "+".join(GLOBAL.arguments.subreddit.split())

            # DELETE THE PLUS (+) AT THE END
-            if not subredditInput.lower() == "frontpage":
+            if not subredditInput.lower() == "frontpage" \
+                and GLOBAL.arguments.subreddit[-1] == "+":
                GLOBAL.arguments.subreddit = GLOBAL.arguments.subreddit[:-1]

            print("\nselect sort type:")
@@ -297,7 +301,7 @@ class PromptUser:
                GLOBAL.arguments.time = "all"

        elif programMode == "multireddit":
-            GLOBAL.arguments.user = input("\nredditor: ")
+            GLOBAL.arguments.user = input("\nmultireddit owner: ")
            GLOBAL.arguments.multireddit = input("\nmultireddit: ")
            
            print("\nselect sort type:")
@@ -569,7 +573,9 @@ def download(submissions):
        print(f" – {submissions[i]['postType'].upper()}",end="",noPrint=True)

        if isPostExists(submissions[i]):
-            print(f"\n{nameCorrector(submissions[i]['postTitle'])}")
+            print(f"\n" \
+                  f"{submissions[i]['postSubmitter']}_"
+                  f"{nameCorrector(submissions[i]['postTitle'])}")
            print("It already exists")
            duplicates += 1
            downloadedCount -= 1
@@ -633,23 +639,33 @@ def download(submissions):
            downloadedCount -= 1

    if duplicates:
-        print("\n There was {} duplicates".format(duplicates))
+        print(f"\nThere {'were' if duplicates > 1 else 'was'} " \
+              f"{duplicates} duplicate{'s' if duplicates > 1 else ''}")

    if downloadedCount == 0:
-        print(" Nothing downloaded :(")
+        print("Nothing downloaded :(")

    else:
-        print(" Total of {} links downloaded!".format(downloadedCount))
+        print(f"Total of {downloadedCount} " \
+              f"link{'s' if downloadedCount > 1 else ''} downloaded!")

 def main():
+
+    VanillaPrint(
+        f"\nBulk Downloader for Reddit v{__version__}\n" \
+        f"Written by Ali PARLAKCI – parlakciali@gmail.com\n\n" \
+        f"https://github.com/aliparlakci/bulk-downloader-for-reddit/"
+    )
    GLOBAL.arguments = parseArguments()

    if GLOBAL.arguments.directory is not None:
-        GLOBAL.directory = Path(GLOBAL.arguments.directory)
+        GLOBAL.directory = Path(GLOBAL.arguments.directory.strip())
    else:
-        GLOBAL.directory = Path(input("download directory: "))
+        GLOBAL.directory = Path(input("\ndownload directory: ").strip())

    print("\n"," ".join(sys.argv),"\n",noPrint=True)
+    print(f"Bulk Downloader for Reddit v{__version__}\n",noPrint=True
+    )

    try:
        checkConflicts()
--- a/src/downloader.py
+++ b/src/downloader.py
@@ -1,13 +1,15 @@
 import io
+import json
 import os
 import sys
 import urllib.request
 from html.parser import HTMLParser
+from multiprocessing import Queue
 from pathlib import Path
 from urllib.error import HTTPError

 import imgurpython
-from multiprocessing import Queue
+from bs4 import BeautifulSoup

 from src.errors import (AlbumNotDownloadedCompletely, FileAlreadyExistsError,
                        FileNameTooLong, ImgurLoginError,
@@ -66,7 +68,8 @@ def getFile(fileDir,tempDir,imageURL,indent=0):
    ]

    opener = urllib.request.build_opener()
-    opener.addheaders = headers
+    if not "imgur" in imageURL:
+        opener.addheaders = headers
    urllib.request.install_opener(opener)

    if not (os.path.isfile(fileDir)):
@@ -441,24 +444,16 @@ class Gfycat:

        url = "https://gfycat.com/" + url.split('/')[-1]

-        pageSource = (urllib.request.urlopen(url).read().decode().split('\n'))
+        pageSource = (urllib.request.urlopen(url).read().decode())

-        theLine = pageSource[lineNumber]
-        lenght = len(query)
-        link = []
+        soup = BeautifulSoup(pageSource, "html.parser")
+        attributes = {"data-react-helmet":"true","type":"application/ld+json"}
+        content = soup.find("script",attrs=attributes)

-        for i in range(len(theLine)):
-            if theLine[i:i+lenght] == query:
-                cursor = (i+lenght)+1
-                while not theLine[cursor] == '"':
-                    link.append(theLine[cursor])
-                    cursor += 1
-                break
-
-        if "".join(link) == "":
+        if content is None:
            raise NotADownloadableLinkError("Could not read the page source")

-        return "".join(link)
+        return json.loads(content.text)["video"]["contentUrl"]

 class Direct:
    def __init__(self,directory,POST):
Author	SHA1	Message	Date
Ali Parlakci	82dcd2f63d	Bug fix	2019-01-27 17:05:31 +03:00
Ali Parlakci	08de21a364	Updated Python3 version	2019-01-27 16:32:43 +03:00
Ali Parlakci	af7d3d9151	Moved FAQ	2019-01-27 16:32:00 +03:00
Ali Parlakci	280147282b	27 jan update	2019-01-27 16:06:31 +03:00
Ali Parlakci	b7baf07fb5	Added instructions	2019-01-27 15:59:24 +03:00
Ali Parlakci	aece2273fb	Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit	2018-08-28 16:28:29 +03:00
Ali Parlakci	f807efe4d5	Ignore space at the end of directory	2018-08-28 16:27:29 +03:00
Ali Parlakci	743d887927	Ignore space at the end of directory	2018-08-28 16:24:14 +03:00
Ali Parlakci	da5492858c	Add bs4	2018-08-28 16:15:22 +03:00
Ali Parlakci	cebfc713d2	Merge branch 'master' of https://github.com/aliparlakci/bulk-downloader-for-reddit	2018-08-28 16:12:01 +03:00
Ali Parlakci	f522154214	Update version	2018-08-28 16:11:48 +03:00
Ali Parlakçı	27cd3ee991	Changed getting gfycat links' algorithm	2018-08-28 16:10:15 +03:00
Ali Parlakci	29873331e6	Typo fix	2018-08-23 16:41:07 +03:00
Ali Parlakci	8a3dcd68a3	Update version	2018-08-23 12:16:31 +03:00
Ali Parlakci	ac323f2abe	Bug fix	2018-08-23 12:09:56 +03:00
Ali Parlakci	32d26fa956	Print out github link at start	2018-08-20 15:13:42 +03:00
Ali Parlakci	137481cf3e	Print out program info	2018-08-18 14:51:20 +03:00
Ali Parlakci	9b63c55d3e	Print out version info before starting	2018-08-17 21:25:01 +03:00
Ali Parlakci	3a6954c7d3	Update version	2018-08-16 19:55:45 +03:00
Ali Parlakci	9a59da0c5f	Update changelog	2018-08-16 19:53:33 +03:00
Ali Parlakci	d56efed1c6	Fix imgur download malfunction caused by headers	2018-08-16 19:51:56 +03:00