HoverHell · jtara1 · Jun 19, 2016 · Jun 19, 2016 · Jun 20, 2016 · Jul 1, 2016
diff --git a/.gitignore b/.gitignore
@@ -1,15 +1,20 @@
 */*.jpg
+*/*.png
+*/*.webm
+*/*.mp4
 *.swp
 *.bak
 *.DS_Store
 *.sh
 *.pyc
 /.idea
-
+venv/*
 
 /.project
 /*~
 /*.webm
 /gfycat
 /build
 /.pydevproject
+
+cli.txt
diff --git a/readme.md → README.md b/readme.md → README.md
@@ -8,15 +8,56 @@ fresh and interesting. The main idea is that the script would download
 any JPEG or PNG formatted image that it found listed in the specified
 subreddit and download them to a folder.
 
+## jtara1 Fork
 
-# Requirements:
+### Features and Changes:
 
- * Python 2 (Python3 might be supported over 2to3, but see for
-   yourself and report back).
- * Optional requirements: listed in setup.py under extras_require.
 
 
-# Usage:
+* Adapted to Python 3 mostly by merge from [ohyou/RedditImageGrab](https://github.com/ohyou/RedditImageGrab) along with some additional fixes
+
+* \-\-num cli argument now counts by reddit submission rather than individual image
+
+    * added submodule `imgur-downloader` which enabled the above feature among other things
+
+
+* file `._history.txt` contains reddit id of last downloaded and is identified by `subreddit` & `ARGS.sort_type`, e.g.:
+
+    > {'wallpapers': {'topmonth': {'last\-id': '4x4so2'}}}
+
+* positional argument, `<subreddit>`, can now autodetect whether value points to subreddit name or subreddit list file
+
+
+* `--subreddit-list srl-filename` cli argument added where srl is the filename containing list of subreddits to process
+
+    * added function to process subreddit list for subreddit links & associated save location for each
+
+    * at this time, the same cli arguments are used for all subreddits in list, but save folder can be altered
+
+    * examples for subreddits.txt added, in folder `subreddit-list-examples`
+
+* updated progress report variables such as DOWNLOADED and ERRORS to accommodate for processing a list of subreddits
+
+* `--restart` cli arg added which begins downloading from the beginning of the subreddit rather than resuming from last download ID.
+
+### Fixes:
+
+* `--filename-format` cli arg should work as expected
+
+* `gfycat.py` failed to download direct links to .webm & .mp4 files
+
+* `gfycat.py` failed to process gfycat links that did not exist
+
+## Issues
+
+* needs more testing
+
+## Requirements:
+
+ * Python 3
+ * Optional requirements: listed in setup.py under extras_require.
+
+## Usage:
 
 See `./redditdl.py --help` for uptodate details.
 
@@ -33,14 +74,16 @@ ordering = ('key', )
 
 Downloads files with specified extension from the specified subreddit.
 
-positional arguments:
+main arguments:
 
-    <subreddit>           Subreddit name.
-    <dest_file>           Dir to put downloaded files in.
+    subreddit <subreddit>       Subreddit or subreddit list file name.
+    dir <dest_file>             Dir to put downloaded files in.
 
 optional arguments:
 
     -h, --help            show this help message and exit
+    --subbreddit-list srl-filename
+                          Take a list of subreddits from a text file, srl = subreddits.txt
     --multireddit         Take multirredit instead of subreddit as input. If so,
                         provide /user/m/multireddit-name as argument
     --last l              ID of the last downloaded file.
@@ -54,43 +97,43 @@ optional arguments:
     --skipAlbums          Skip all albums
     --mirror-gfycat       Download available mirror in gfycat.com.
     --filename-format FILENAME_FORMAT
-                        Specify filename format: reddit (default), title or
-                        url
+                        Specify filename format: reddit (default), title or url
     --sort-type         Sort the subreddit.
+    --restart           Begin downloading from beginning of subreddit rather than resuming from last dl subreddit submission.
 
 
-# Examples
+## Examples
 
 An example of running this script to download images with a score
 greater than 50 from the wallpaper sub-reddit into a folder called
 wallpaper would be as follows:
 
-    python redditdl.py wallpaper wallpaper --score 50
+    python3 redditdl.py wallpaper wallpaper --score 50
 
 And to run the same query but only get new images you don't already
 have, run the following:
 
-    python redditdl.py wallpaper wallpaper --score 50 -update
+    python3 redditdl.py wallpaper wallpaper --score 50 -update
 
 For getting some nice pictures of cats in your catsfolder (wich will be created if it
 doesn't exist yet) run:
 
-    python redditdl.py cats ~/Pictures/catsfolder --score 1000 --num 5 --sfw --verbose
+    python3 redditdl.py cats ~/Pictures/catsfolder --score 1000 --num 5 --sfw --verbose
 
 
-## Advanced Examples
+### Advanced Examples
 
-Retrieve last 10 pics in the 'wallpaper' subreddit with the word
+Retrieve pics from last 10 submission in the 'wallpaper' subreddit with the word
 "sunset" in the title (note: case is ignored by (?i) predicate)
 
-    python redditdl.py wallpaper sunsets --regex '(?i).*sunset.*' --num 10
+    python3 redditdl.py wallpaper sunsets --regex '(?i).*sunset.*' --num 10
 
 Download top week post from subreddit 'animegifs' and use gfycat gif mirror (if available)
 
-	python redditdl.py animegifs --sort-type topweek --mirror-gfycat
+	python3 redditdl.py animegifs --sort-type topweek --mirror-gfycat
 
 
-## Sorting
+### Sorting
 
 Available sorting are following : hot, new, rising, controversial, top, gilded
 

diff --git a/TODO.md b/TODO.md
@@ -0,0 +1,7 @@
+## todo
+
+* fix downloading from deviantart, tubmlr, pixiv.net, instagram & other sites
+
+* record metadata (submission link & comments, local file location) in database
+
+* integrate youtube-dl module to handle all video links
diff --git a/redditdl.py b/redditdl.py
@@ -11,4 +11,4 @@
 
 
 if __name__ == '__main__':
-    main()
+    main("")
diff --git a/redditdownload/gfycat.py b/redditdownload/gfycat.py
@@ -1,5 +1,10 @@
 from collections import namedtuple
-
+import urllib.request, urllib.error, urllib.parse
+from urllib.error import URLError
+import json
+import random
+import string
+import requests
 
 class gfycat(object):
 
@@ -23,21 +28,17 @@ def __init__(self):
         super(gfycat, self).__init__()
 
     def __fetch(self, url, param):
-        import urllib2
-        import json
         try:
             # added simple User-Ajent string to avoid CloudFlare block this request
             headers = {'User-Agent': 'Mozilla/5.0'}
-            req = urllib2.Request(url+param, None, headers)
-            connection = urllib2.urlopen(req).read()
-        except urllib2.HTTPError, err:
+            req = urllib.request.Request(url+param, None, headers)
+            connection = urllib.request.urlopen(req).read()
+        except urllib.error.HTTPError as err:
             raise ValueError(err.read())
         result = namedtuple("result", "raw json")
-        return result(raw=connection, json=json.loads(connection))
+        return result(raw=connection, json=json.loads(connection.decode('ascii')))
 
     def upload(self, param):
-        import random
-        import string
         # gfycat needs to get a random string before our search parameter
         randomString = ''.join(random.choice
             (string.ascii_uppercase + string.digits) for _ in range(5))
@@ -55,9 +56,6 @@ def uploadFile(self, file):
 
     def __fileHandler(self, file):
         # Thanks thesourabh for the implementation
-        import random
-        import string
-        import requests
         # gfycat needs a random key before upload
         key = ''.join(random.choice
             (string.ascii_uppercase + string.digits) for _ in range(10))
@@ -80,8 +78,11 @@ def __fileHandler(self, file):
 
     def more(self, param):
         result = self.__fetch(self.url, "/cajax/get/%s" % param)
-        if "error" in result.json["gfyItem"]:
-            raise ValueError("%s" % self.json["gfyItem"]["error"])
+        try:
+            if result.json['error']:
+                raise URLError('%s%s%s' % ('DNE: ', 'http://gfycat.com/', param))
+        except KeyError:
+            pass # no error reported in json
         return _gfycatMore(result)
 
     def check(self, param):
@@ -117,26 +118,24 @@ def get(self, what):
             return ("Sorry, can't find %s" % error)
 
     def download(self, location):
-        import urllib2
         if not location.endswith(".mp4"):
             location = location + self.get("gfyName") + ".mp4"
         try:
             # added simple User-Ajent string to avoid CloudFlare block this request
             headers = {'User-Agent': 'Mozilla/5.0'}
-            req = urllib2.Request(self.get("mp4Url"), None, headers)
-            file = urllib2.urlopen(req)
+            req = urllib.request.Request(self.get("mp4Url"), None, headers)
+            file = urllib.request.urlopen(req)
             # make sure that the status code is 200, and the content type is mp4
             if int(file.code) is not 200 or file.headers["content-type"] != "video/mp4":
                 raise ValueError("Problem downlading the file. Status code is %s or the content-type is not right %s"
                     % (file.code, file.headers["content-type"]))
             data = file.read()
             with open(location, "wb") as mp4:
                 mp4.write(data)
-        except urllib2.HTTPError, err:
+        except urllib.error.HTTPError as err:
             raise ValueError(err.read())
 
     def formated(self, ignoreNull=False):
-            import json
             if not ignoreNull:
                 return json.dumps(self.js, indent=4,
                     separators=(',', ': ')).strip('{}\n')

diff --git a/redditdownload/img_scrap_stuff.py b/redditdownload/img_scrap_stuff.py
@@ -10,11 +10,11 @@
 import re
 import json
 import logging
-import urlparse
+import urllib.parse
 import traceback
 
 from PIL import Image
-from cStringIO import StringIO
+from io import StringIO
 import lxml
 import html5lib  # Heavily recommended for bs4 (apparently)
 import bs4
@@ -52,7 +52,7 @@ def indexall_re(topstr, substr_re):
 def walker(text, opening='{', closing='}'):
     """ A near-useless experiment that was intended for `get_all_objects` """
     stack = []
-    for pos in xrange(len(text)):
+    for pos in range(len(text)):
         if text[pos:pos + len(opening)] == opening:
             stack.append(pos)
             continue
@@ -88,7 +88,7 @@ def get_all_objects(text, beginning=r'{', debug=False):
     """
 
     def _dbg_actual(st, *ar):
-        print "D: ", st % ar
+        print("D: ", st % ar)
 
     _dbg = _dbg_actual if debug else (lambda *ar: None)
 
@@ -106,9 +106,9 @@ def __getitem__(self, key):
     class TheLoader(yaml.SafeLoader):
         ESCAPE_REPLACEMENTS = ddd(yaml.SafeLoader.ESCAPE_REPLACEMENTS)
 
-    from cStringIO import StringIO
+    from io import StringIO
     # optimised slicing
-    if isinstance(text, unicode):
+    if isinstance(text, str):
         _dbg("encoding")
         text = text.encode('utf-8')
     _dbg("Length: %r", len(text))
@@ -214,13 +214,13 @@ def get_get_get(url, **kwa):
 
 def get_get(*ar, **kwa):
     retries = kwa.pop('_xretries', 5)
-    for retry in xrange(retries):
+    for retry in range(retries):
         try:
             return get_get_get(*ar, **kwa)
         except Exception as exc:
             traceback.print_exc()
             ee = exc
-            print "On retry #%r   (%s)" % (retry, repr(exc)[:30])
+            print("On retry #%r   (%s)" % (retry, repr(exc)[:30]))
     raise GetError(ee)
 
 
@@ -244,7 +244,7 @@ def get(url, cache_file=None, req_params=None, bs=True, response=False, undecode
             for chunk in resp.iter_content(chunk_size=16384):
                 data += chunk
                 if len(data) > _max_len:
-                    print "Too large"
+                    print("Too large")
                     break
             data = bytes(data)  ## Have to, alas.
             data_bytes = data
@@ -274,7 +274,7 @@ def _filter(l):
 
 
 def _url_abs(l, base_url):
-    return (urlparse.urljoin(base_url, v) for v in l)
+    return (urllib.parse.urljoin(base_url, v) for v in l)
 
 
 def _preprocess_bs_links(bs, links):
@@ -413,7 +413,7 @@ def _pp(lst):
                for val in lst
                if val.startswith('http') or val.startswith('/')]
         # (urljoin should be done already though)
-        return [urlparse.urljoin(url, val) for val in res]
+        return [urllib.parse.urljoin(url, val) for val in res]
 
     imgs, links = bs2img(bs), bs2lnk(bs)
     to_check = imgs + links

diff --git a/redditdownload/imgur-downloader/.gitignore b/redditdownload/imgur-downloader/.gitignore
@@ -0,0 +1,3 @@
+.DS_Store
+test.py
+*.pyc
diff --git a/redditdownload/imgur-downloader/LICENSE b/redditdownload/imgur-downloader/LICENSE
@@ -0,0 +1,7 @@
+Copyright (C) 2012 Alex Gisby
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/redditdownload/imgur-downloader/__init__.py b/redditdownload/imgur-downloader/__init__.py
diff --git a/redditdownload/imgur-downloader/imgur-dne.png b/redditdownload/imgur-downloader/imgur-dne.png
Original file line number	Diff line number	Diff line change
Expand Up		@@ -11,4 +11,4 @@


		if __name__ == '__main__':
		main()
		main("")