RCS file: /var/cvs/ruby-xmltv/tv_grab_nl_upc,v Working file: tv_grab_nl_upc head: 1.228 branch: locks: strict access list: symbolic names: v1-10-0: 1.228 v1-9-3: 1.225 v1-9-2: 1.223 v1-9-1: 1.220 v1-9-0: 1.218 v1-8-3: 1.212 v1-8-2: 1.210 v1-8-1: 1.208 v1-8-0: 1.206 v1-7-0: 1.200 v1-6-1: 1.189 v1-6-0: 1.185 v1-5-3: 1.180 v1-5-2: 1.178 v1-5-1: 1.176 v1-5-0: 1.173 v1-1-0: 1.168 v1-0-1: 1.159 v1-0-0: 1.157 v0-10-1: 1.153 v0-10-0: 1.151 v0-9-10: 1.140 v0-9-9: 1.133 v0-9-8: 1.131 v0-9-7: 1.126 v0-9-6: 1.123 v0-9-5: 1.120 v0-9-4: 1.118 v0-9-3: 1.116 v0-9-2: 1.113 v0-9-1: 1.111 v0-9-0: 1.108 v0-8-9: 1.96 v0-8-8: 1.95 v0-8-7: 1.93 v0-8-6: 1.91 v0-8-5: 1.89 v0-8-4: 1.87 v0-8-3: 1.85 v0-8-2: 1.83 v0-8-1: 1.78 v0-8-0: 1.76 v0-7-2: 1.70 v0-7-1: 1.69 v0-7-0: 1.67 v0-6-1: 1.64 v0-6-0: 1.59 v0-5-1: 1.49 v0-5-0: 1.43 v0-4-0: 1.34 v0-3-0: 1.16 v0-2-0: 1.6 v0-1-0: 1.1.1.1 ruby-xmltv: 1.1.1.1 default: 1.1.1.1 caliban: 1.1.1 keyword substitution: kv total revisions: 229; selected revisions: 229 description: ---------------------------- revision 1.228 date: 2011/04/05 07:33:12; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.10.0. ---------------------------- revision 1.227 date: 2011/04/05 07:28:50; author: ianmacd; state: Exp; lines: +28 -25 Programme detail errors are no longer fatal. ---------------------------- revision 1.226 date: 2011/04/05 07:05:35; author: ianmacd; state: Exp; lines: +28 -23 Fix more breakage caused by changes to the UPC site. ---------------------------- revision 1.225 date: 2011/01/22 00:03:19; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.9.3. ---------------------------- revision 1.224 date: 2011/01/22 00:02:57; author: ianmacd; state: Exp; lines: +6 -7 Fix longstanding bug that causes no pages to be fetched unless --verbose is used. ---------------------------- revision 1.223 date: 2011/01/01 14:23:52; author: ianmacd; state: Exp; lines: +4 -4 Bump version to 1.9.2. ---------------------------- revision 1.222 date: 2011/01/01 01:49:07; author: ianmacd; state: Exp; lines: +10 -2 Allow String#bytesize to work for Ruby pre-1.8.7. ---------------------------- revision 1.221 date: 2011/01/01 01:22:25; author: ianmacd; state: Exp; lines: +24 -7 Deal with the case that the detail URL can't be found in the interim page. ---------------------------- revision 1.220 date: 2010/12/26 21:44:02; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.9.1. ---------------------------- revision 1.219 date: 2010/12/26 03:12:55; author: ianmacd; state: Exp; lines: +3 -3 --icons was failing, due to HTML changes at UPC. ---------------------------- revision 1.218 date: 2010/12/25 02:03:15; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.9.0. ---------------------------- revision 1.217 date: 2010/12/25 00:39:23; author: ianmacd; state: Exp; lines: +4 -4 IMDB ratings were still not working, because the denominator was missing. ---------------------------- revision 1.216 date: 2010/12/24 16:37:18; author: ianmacd; state: Exp; lines: +5 -5 Improve logging layout. ---------------------------- revision 1.215 date: 2010/12/24 04:52:11; author: ianmacd; state: Exp; lines: +3 -4 Fix IMDB rating regex to work with new HTML. ---------------------------- revision 1.214 date: 2010/12/24 02:44:26; author: ianmacd; state: Exp; lines: +27 -15 Make fixes for changes to UPC's site. ---------------------------- revision 1.213 date: 2010/05/15 21:43:28; author: ianmacd; state: Exp; lines: +3 -3 Fix American spelling of programme. ---------------------------- revision 1.212 date: 2010/03/26 09:34:05; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.8.3. ---------------------------- revision 1.211 date: 2010/03/26 09:33:43; author: ianmacd; state: Exp; lines: +13 -19 Fixed bug that caused programmes that began exactly at midnight to have a day added to their end time. ---------------------------- revision 1.210 date: 2010/03/25 23:06:25; author: ianmacd; state: Exp; lines: +5 -4 Bump version to 1.8.2. ---------------------------- revision 1.209 date: 2010/03/25 18:34:07; author: ianmacd; state: Exp; lines: +22 -18 Fix serious bug introduced in 0.8.1 that caused lots of programming to erroneously get shunted to the next day. Ignored programmes were still added to cache total if they were previously cached. Remove a debugging statement left in by accident. ---------------------------- revision 1.208 date: 2010/03/19 10:56:20; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.8.1. ---------------------------- revision 1.207 date: 2010/03/19 10:56:03; author: ianmacd; state: Exp; lines: +12 -7 Fix a bug that occurs only when --days 1 is used in combination with --actual. Programmes that begin after midnight (so actually tomorrow) are erroneously seen as having already been broadcast, because the hour of their start is lower than the hour at which the grabber is being run. ---------------------------- revision 1.206 date: 2010/02/09 14:26:39; author: ianmacd; state: Exp; lines: +5 -4 Add a warning to the help text for --threads. ---------------------------- revision 1.205 date: 2010/02/09 14:04:49; author: ianmacd; state: Exp; lines: +4 -4 When backing-off, increase sleep time in increments of 0.2 seconds, not 0.25. ---------------------------- revision 1.204 date: 2010/02/09 14:00:11; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.8.0. ---------------------------- revision 1.203 date: 2010/02/09 13:50:22; author: ianmacd; state: Exp; lines: +3 -3 Server throttling returns HTTP 503, not 200. ---------------------------- revision 1.202 date: 2010/02/09 13:34:27; author: ianmacd; state: Exp; lines: +12 -3 Add new option, --user-agent STRING, to anticipate a possible future need to masquerade as a normal browser when talking to UPC. ---------------------------- revision 1.201 date: 2010/02/09 13:15:50; author: ianmacd; state: Exp; lines: +21 -10 --sleep no longer worked properly, because it still assumed the old UPC architecture of one TV guide per channel per day. Consequently, most HTTP GETs weren't being followed by sleep. --sleep's argument was being displayed as an integer (%d) instead of a float (%f) in messages. Use %.2f for 2 significant digits. Altered a regex to be slightly more efficient. UPC has started throttling (i.e. rate-limiting) connections. This renders --threads all but unusable. We now detect this and increase --sleep by 0.25 seconds per HTTP GET until we are no longer being throttled. ---------------------------- revision 1.200 date: 2010/01/18 13:51:38; author: ianmacd; state: Exp; lines: +5 -4 Make it possible to negate --actual and --ignore-old. ---------------------------- revision 1.199 date: 2010/01/18 13:19:48; author: ianmacd; state: Exp; lines: +3 -3 Double-quote here-document marker. ---------------------------- revision 1.198 date: 2010/01/18 10:35:57; author: ianmacd; state: Exp; lines: +63 -63 Fix some space vs. tab formatting. Bump version to 1.7.0. ---------------------------- revision 1.197 date: 2010/01/18 10:20:12; author: ianmacd; state: Exp; lines: +3 -3 Modified a regex for possible greater efficiency. ---------------------------- revision 1.196 date: 2010/01/18 01:15:57; author: ianmacd; state: Exp; lines: +3 -3 Minor textual change to a debugging message. ---------------------------- revision 1.195 date: 2010/01/18 00:19:43; author: ianmacd; state: Exp; lines: +56 -30 Move some safe code out of a begin/rescue/end block. Show much more programme data with --debug. Actors are now listed by UPC with the keyword 'Met' instead of 'Cast'. Ratings are now listed by UPC with the keyword 'Leeftijd' instead of 'Kijkwijzer'. ---------------------------- revision 1.194 date: 2010/01/17 23:18:25; author: ianmacd; state: Exp; lines: +21 -18 --actual is now on by default, because UPC's site can no longer supply full details on programmes that have already aired today. Using --no-actual will just cause warnings about programmes that are in the guide, but have no detailed information. Actually, --actual used to ignore programmes that had ended before the grabber was run. It now ignores programmes whose start hour was less than the current hour. Code clean-up. Programmes now use @time instead of @start_time to temporarily store their start time while their details are being looked up. Ultimately, the value in @time is replaced with a string containing both the start and end times. ---------------------------- revision 1.193 date: 2010/01/17 18:50:40; author: ianmacd; state: Exp; lines: +92 -78 UPC has upset the apple cart again and changed the site around. Consequently, significant chunks of the grabber had to be rewritten. ---------------------------- revision 1.192 date: 2010/01/17 10:47:18; author: ianmacd; state: Exp; lines: +18 -2 Catch the case of no channels being found at UPC and don't proceed. The sample guide fetch for Nederland 1 and RTL 4 shouldn't be attempted if either channel is missing from the channel list obtained from UPC. ---------------------------- revision 1.191 date: 2010/01/06 12:38:34; author: ianmacd; state: Exp; lines: +45 -17 Add --ignore-errors, which will allow us to continue when we fail to find any programme's data. This usually happens when a page of Web server errors is returned, which causes our regex matching to fail to find title, time, etc. Display CVS Id tag if --debug is used. Update copyright message to 2010. ---------------------------- revision 1.190 date: 2009/12/06 14:58:02; author: ianmacd; state: Exp; lines: +3 -3 Minor text message change. ---------------------------- revision 1.189 date: 2009/12/04 09:22:14; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.6.1. ---------------------------- revision 1.188 date: 2009/12/04 08:35:07; author: ianmacd; state: Exp; lines: +3 -3 Fixed regex to enable IMDB dynamic ratings to work again. ---------------------------- revision 1.187 date: 2009/12/04 08:34:10; author: ianmacd; state: Exp; lines: +4 -4 Minor adjustment to text message. ---------------------------- revision 1.186 date: 2009/07/17 21:21:57; author: ianmacd; state: Exp; lines: +6 -3 Need to strip programme description of whitespace even when there's no extra detail page. ---------------------------- revision 1.185 date: 2009/06/15 06:59:55; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.6.0. ---------------------------- revision 1.184 date: 2009/06/15 00:03:49; author: ianmacd; state: Exp; lines: +4 -3 Fix Ruby 1.9 encoding issue with --static-ratings. ---------------------------- revision 1.183 date: 2009/06/14 23:14:20; author: ianmacd; state: Exp; lines: +9 -9 'require' YAML only when it's sub-1.8. Remove a forced encoding that turned out not to be necessary. ---------------------------- revision 1.182 date: 2009/06/14 18:49:56; author: ianmacd; state: Exp; lines: +29 -22 More encoding fixes for Ruby 1.9. Use Marshal instead of YAML for serialisation of the IMDB ratings cache in Ruby 1.9. YAML just doesn't seem to want to work with non UTF-8 encodings. Fix the treatment of HTTP codes other than 200, which weren't properly being dealt with as an error. ---------------------------- revision 1.181 date: 2009/06/12 09:00:47; author: ianmacd; state: Exp; lines: +33 -24 Various encoding fixes for Ruby 1.9. ---------------------------- revision 1.180 date: 2009/06/03 07:31:22; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.5.3. ---------------------------- revision 1.179 date: 2009/06/03 07:30:52; author: ianmacd; state: Exp; lines: +7 -6 Literal ampersands were not always being replaced in programme titles, subtitles and descriptions. ---------------------------- revision 1.178 date: 2009/05/22 08:55:08; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.5.2. ---------------------------- revision 1.177 date: 2009/05/22 08:54:47; author: ianmacd; state: Exp; lines: +7 -3 Channel names with a slash, such as 'Ketnet / Canvas' and 'Nick/Comedy C.', must have the slash replaced by a backslash, in order for the resulting URLs to work with the UPC site. ---------------------------- revision 1.176 date: 2009/05/22 00:03:34; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.5.1. ---------------------------- revision 1.175 date: 2009/05/22 00:03:15; author: ianmacd; state: Exp; lines: +7 -5 Condense two cache reporting lines into a single one. ---------------------------- revision 1.174 date: 2009/05/21 20:30:22; author: ianmacd; state: Exp; lines: +4 -4 Fix silly method name errors that caused --configure not to work. ---------------------------- revision 1.173 date: 2009/05/16 18:53:32; author: ianmacd; state: Exp; lines: +4 -4 Fix an informational path. Bump version to 1.5.0. ---------------------------- revision 1.172 date: 2009/05/16 16:23:53; author: ianmacd; state: Exp; lines: +95 -29 Handle actors, directors and censor board ratings, if these are available in the UPC data. Move the handling of presenters into Programme.get_detail, where it logically belongs. ---------------------------- revision 1.171 date: 2009/05/16 12:06:57; author: ianmacd; state: Exp; lines: +50 -34 Don't bother fetching the extra page that contains the full programme description if the description on the detail page hasn't been abbreviated. This can be determined by the presence of an ellipsis at the end of the description. If it's not abbreviated, we can use that one. Replace literal ampersands in the description with & entities. In Channel.get_available, sort the UPC channel display list case-insensitively. ---------------------------- revision 1.170 date: 2009/05/15 21:51:24; author: ianmacd; state: Exp; lines: +952 -688 Create new classes, such as Programme, Guide and Channel and move methods into their logical location. Add --[no]-cache [DIR]. --cache is the default and uses ~/.xmltv/programme.cache. If DIR is given, that directory is used for programme.cache instead. ---------------------------- revision 1.169 date: 2009/05/13 18:07:11; author: ianmacd; state: Exp; lines: +164 -150 Massive changes to repair damage caused by total overhaul of UPC TV guide. ---------------------------- revision 1.168 date: 2009/04/20 09:09:03; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.1.0. ---------------------------- revision 1.167 date: 2009/04/19 16:12:07; author: ianmacd; state: Exp; lines: +17 -11 New option, --ignore-negatives, allows negative entries in the ratings cache to be ignored, so that a look-up will be attempted even though a previous look-up had failed. This is primarily to aid the author in debugging potential improvements to the ratings system. The regex code to skip poster thumbnails during a rating look-up was actually unnecessary. The regex code to allow a film to be found under an 'a.k.a.' title was severely flawed and could easily result in exponential backtracking. ---------------------------- revision 1.166 date: 2009/04/18 23:19:20; author: ianmacd; state: Exp; lines: +24 -10 Further improvements to IMDB rating look-ups, by allowing numeric HTML entity matching to match on decimal or hexadecimal. ---------------------------- revision 1.165 date: 2009/04/18 13:07:31; author: ianmacd; state: Exp; lines: +4 -3 When performing a dynamic IMDB look-up, allow 'and' in the title of a film to also match & or & ---------------------------- revision 1.164 date: 2009/04/18 09:32:29; author: ianmacd; state: Exp; lines: +16 -5 Improve IMDB ratings detection, so that films with original titles in a different language are still found if their 'a.k.a.' line at IMDB contains the title we're looking for. ---------------------------- revision 1.163 date: 2009/04/17 23:06:04; author: ianmacd; state: Exp; lines: +22 -12 When looking up a film rating on IMDB, we now need to detect and ignore the HTML for film poster thumbnails. Improve film title matching against IMDB when dynamically looking up ratings. Non-alphanumeric characters in titles, such as apostrophes, can also be matched against their numeric HTML entity form. Another improvement has been made to dynamic rating look-ups. Films with titles of the form 'Star Is Born, A' will be retried as 'A Star Is Born' if the initial look-up fails. ---------------------------- revision 1.162 date: 2009/04/16 23:03:16; author: ianmacd; state: Exp; lines: +3 -3 When splitting a title into title/subtitle, allow for the possibility of multiple consecutive colons. ---------------------------- revision 1.161 date: 2009/04/13 23:39:13; author: ianmacd; state: Exp; lines: +28 -8 Add new option, --prune-subtitles, which prunes trailing full-stops from programme subtitles. Sometimes, the presence of a full-stop is the only difference between two showings of the same programme. This option allows the user to eliminate that difference. ---------------------------- revision 1.160 date: 2009/03/19 12:59:32; author: ianmacd; state: Exp; lines: +3 -4 --threads no longer marked as experimental. ---------------------------- revision 1.159 date: 2009/03/12 21:55:56; author: ianmacd; state: Exp; lines: +9 -7 Bump version to 1.0.1. Report which channel was affected when a guide data page fetch fails. Report which channel was affected when an icon path can't be determined. Simplified a nested if. Update date in copyright message. ---------------------------- revision 1.158 date: 2009/03/12 00:48:50; author: ianmacd; state: Exp; lines: +9 -8 http://epg.upc.nl has gone away and we now need to use http://www.upclive.nl ---------------------------- revision 1.157 date: 2008/12/25 19:01:49; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 1.0.0. ---------------------------- revision 1.156 date: 2008/12/25 06:16:23; author: ianmacd; state: Exp; lines: +3 -3 Dynamic IMDB ratings were no longer working due to HTML change. ---------------------------- revision 1.155 date: 2008/12/25 06:02:49; author: ianmacd; state: Exp; lines: +7 -2 Define new option, --ignore-old, as an alias for --actual. ---------------------------- revision 1.154 date: 2008/12/24 05:24:02; author: ianmacd; state: Exp; lines: +4 -4 Minor text changes. ---------------------------- revision 1.153 date: 2008/11/01 17:12:51; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.10.1. ---------------------------- revision 1.152 date: 2008/11/01 17:12:24; author: ianmacd; state: Exp; lines: +7 -7 Fix reporting of IMDB rating totals. ---------------------------- revision 1.151 date: 2008/10/31 15:54:49; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.10.0. ---------------------------- revision 1.150 date: 2008/10/30 03:56:30; author: ianmacd; state: Exp; lines: +12 -2 If local time zone is not CET/CEST, programme start and finish times will be incorrect in the output. In this case, we issue a warning when --sanity-check is used (unless --quiet is also used). ---------------------------- revision 1.149 date: 2008/10/29 15:12:54; author: ianmacd; state: Exp; lines: +5 -5 Minor alteration to strings reporting ratings. ---------------------------- revision 1.148 date: 2008/10/27 02:48:53; author: ianmacd; state: Exp; lines: +2 -6 --actual together with --debug no longer displays detailed information on which programmes were ignored. ---------------------------- revision 1.147 date: 2008/10/27 02:37:29; author: ianmacd; state: Exp; lines: +23 -18 Information given by --actual in combination with --verbose is now just a summary. The original, more detailed information is displayed by --debug. Changed a few text messages. ---------------------------- revision 1.146 date: 2008/10/26 15:48:01; author: ianmacd; state: Exp; lines: +45 -19 New option, --actual, ignores programmes that have already ended. --verbose now reports ignored programmes if --actual was used. --threads now has a negative counterpart, --no-threads (which is still the default). Copyright message displayed by --version updated to 2008. Some minor changes to various text messages. ---------------------------- revision 1.145 date: 2008/10/25 20:45:11; author: ianmacd; state: Exp; lines: +6 -5 Minor code clean-up of option toggles. ---------------------------- revision 1.144 date: 2008/10/25 20:25:55; author: ianmacd; state: Exp; lines: +31 -8 Provide a check to ensure that summer/winter time changes don't mess with the calculation of day names in DAYS_OF_WEEK. Provide a warning when a programme starts in summer time, but ends in winter time, as the duration may then be off by ±1 hour. ---------------------------- revision 1.143 date: 2008/10/21 01:59:56; author: ianmacd; state: Exp; lines: +2 -8 Remove the long defunct --schema option. ---------------------------- revision 1.142 date: 2008/10/21 01:56:43; author: ianmacd; state: Exp; lines: +12 -7 Deal with the case of a programme's end time being before its start time when the hour is the same. For example, a programme with a start time of 02:50 and an end time of 02:30 would not have been treated as if it were 23h 40m long. The case where the hour is different, e.g. 02:50 - 01:50, was already handled correctly. ---------------------------- revision 1.141 date: 2008/10/13 19:01:09; author: ianmacd; state: Exp; lines: +2 -20 Remove legacy ratings cache code. ---------------------------- revision 1.140 date: 2008/10/12 15:46:41; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.10. ---------------------------- revision 1.139 date: 2008/10/12 15:46:09; author: ianmacd; state: Exp; lines: +24 -5 When using --threads in combination with --static-ratings, the first thread must completely read the static ratings file into the @@ratings hash before other threads attempt to use the hash for look-ups. Otherwise, we get a lot of false negatives. Using a mutex protects us against this. ---------------------------- revision 1.138 date: 2008/10/12 09:41:50; author: ianmacd; state: Exp; lines: +9 -12 Remove rating regex duplication by defining a variable for a regex that is used twice. Corrected some formatting for lines > 80 chars. Fixed the text of a rarely seen error message about guide rotation not having occurred after midnight. ---------------------------- revision 1.137 date: 2008/10/12 09:15:43; author: ianmacd; state: Exp; lines: +4 -4 Dynamic IMDB ratings were no longer working due to HTML change. ---------------------------- revision 1.136 date: 2008/10/11 18:44:09; author: ianmacd; state: Exp; lines: +10 -5 Add a few more words to the list of acceptable lower-case joiner words when searching for a subtitle. ---------------------------- revision 1.135 date: 2008/10/08 16:16:36; author: ianmacd; state: Exp; lines: +6 -5 Add "'n'" to the list of acceptable lower-case joiner words when searching for a subtitle. Fix --ratings warning to use spaces instead of tabs. ---------------------------- revision 1.134 date: 2008/10/07 23:09:04; author: ianmacd; state: Exp; lines: +3 -3 Add 'vs' to the list of acceptable lower-case joiner words when searching for a subtitle. ---------------------------- revision 1.133 date: 2008/06/17 05:26:30; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.9. ---------------------------- revision 1.132 date: 2008/06/17 05:10:30; author: ianmacd; state: Exp; lines: +10 -8 --sanity-check's attempt to determine whether something's wrong with the guide data fetches a sample page for Nederland 1. If there are < 5 programmes on that page, we abort. This check isn't trustworthy, so now we check both Nederland 1 and RTL 4, aborting if there are < 10 programmes in total. ---------------------------- revision 1.131 date: 2008/06/01 12:13:21; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.8. ---------------------------- revision 1.130 date: 2008/06/01 12:12:59; author: ianmacd; state: Exp; lines: +3 -3 Add 'het' to the list of acceptable lower-case joiner words when searching for a subtitle. ---------------------------- revision 1.129 date: 2008/06/01 07:11:18; author: ianmacd; state: Exp; lines: +3 -3 Add 'or' to the list of acceptable lower-case joiner words when searching for a subtitle. ---------------------------- revision 1.128 date: 2008/05/18 22:31:15; author: ianmacd; state: Exp; lines: +3 -2 Use PASV mode for FTP'ing IMDB static ratings file. Thanks to Alain Hertog for suggesting this. ---------------------------- revision 1.127 date: 2008/04/26 10:30:03; author: ianmacd; state: Exp; lines: +17 -7 Compatibility fixes to run under Ruby 1.9. ---------------------------- revision 1.126 date: 2008/04/22 09:02:41; author: ianmacd; state: Exp; lines: +3 -3 Bump to 0.9.7. ---------------------------- revision 1.125 date: 2008/04/21 14:15:58; author: ianmacd; state: Exp; lines: +3 -3 Fix REXML insertion of newlines into XML in versions of Ruby 1.8.6 somewhere after patch level 36. ---------------------------- revision 1.124 date: 2008/04/21 13:25:59; author: ianmacd; state: Exp; lines: +4 -3 Category mapping Detective should go to Crime/Mystery, not Drama. New category mapping Documentaire => Documentary. ---------------------------- revision 1.123 date: 2008/02/22 21:59:00; author: ianmacd; state: Exp; lines: +4 -4 Bump to 0.9.6. ---------------------------- revision 1.122 date: 2008/02/21 21:31:35; author: ianmacd; state: Exp; lines: +8 -2 When creating the directory needed by --config-file, check to see if it exists but is not a directory. ---------------------------- revision 1.121 date: 2008/02/21 21:06:08; author: ianmacd; state: Exp; lines: +82 -53 Fix IMDB dynamic ratings after changes to the IMDB site made many look-ups fail. Further improve the chance of a successful IMDB look-up by retrying unfound titles of the form 'Foo Bar, The' as 'The Foo Bar'. Improve the final report with details of number of pages and programmes fetched per second. ---------------------------- revision 1.120 date: 2007/11/22 18:42:34; author: ianmacd; state: Exp; lines: +5 -5 Bump version to 0.9.5. ---------------------------- revision 1.119 date: 2007/11/22 14:24:57; author: ianmacd; state: Exp; lines: +13 -12 Fix fairly rare occurrence whereby programme that starts after midnight and runs until the next day does not have its end date adjusted accordingly. Programmes that start after midnight have a day added to their start and end time. Programmes whereby the end hour is less than the start hour have a day added to the end time. The bug occurred because these two conditions were in an if/elsif clause, but both conditions occasionally apply to a programme. For example, if program X starts at 05:00 on day N, it actually belongs to day N + 1, so the start and end dates are adjusted by +1. However, if its end hour is 00:00, then it actually runs until midnight on day N + 2, so we still need to add a day to the end date. ---------------------------- revision 1.118 date: 2007/08/27 20:47:16; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.4. ---------------------------- revision 1.117 date: 2007/08/27 19:18:19; author: ianmacd; state: Exp; lines: +24 -23 Changes to UPC's site broke --configure and caused bogus warnings when --sanity-check was used. ---------------------------- revision 1.116 date: 2007/08/17 13:21:07; author: ianmacd; state: Exp; lines: +3 -3 Updated to 0.9.3. ---------------------------- revision 1.115 date: 2007/08/17 13:18:35; author: ianmacd; state: Exp; lines: +14 -13 Regular expression for detecting movies did not have /x, with effect that films with the genre 'Romantiek' were not considered to be films. Also, any programme with the genre 'Speelfilm' is now considered to be a film, regardless of its length. ---------------------------- revision 1.114 date: 2007/08/16 20:04:33; author: ianmacd; state: Exp; lines: +20 -6 Category tag should ideally have a lang attribute: 'en' when category translation occurs, otherwise 'nl'. If we find an episode number, we should create an episode-num tag. ---------------------------- revision 1.113 date: 2007/08/16 00:25:47; author: ianmacd; state: Exp; lines: +4 -6 Update to 0.9.2. ---------------------------- revision 1.112 date: 2007/08/14 23:47:16; author: ianmacd; state: Exp; lines: +13 -4 When trying to derive a subtitle, we now check for a trailing episode string at the end of the description. If we find one, we append it to the already derived subtitle, if applicable, and use that as the subtitle. This increases the chance of finding a usable subtitle and also increases the chance of subtitle uniqueness, which in turn increases MythTV's chance of detecting duplicate programmes. By default, it does this by looking for a unique subtitle/description pair. ---------------------------- revision 1.111 date: 2007/07/20 19:33:54; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.1. ---------------------------- revision 1.110 date: 2007/07/20 16:45:00; author: ianmacd; state: Exp; lines: +4 -2 New category translations: Gezondheid => Health/Medical Sportmagazine => Sports ---------------------------- revision 1.109 date: 2007/07/20 15:05:45; author: ianmacd; state: Exp; lines: +38 -24 --[no-]ratings can now take a parameter, DIR. If given, DIR/ratings_cache.yaml is used for the ratings cache instead of ~/.xmltv/ratings_cache.yaml ---------------------------- revision 1.108 date: 2007/07/17 14:27:44; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.9.0. ---------------------------- revision 1.107 date: 2007/07/17 13:50:51; author: ianmacd; state: Exp; lines: +7 -8 Move a calculation outside of a loop, as it doesn't need to be recalculated on each iteration. ---------------------------- revision 1.106 date: 2007/07/16 19:55:45; author: ianmacd; state: Exp; lines: +29 -33 Simplify construction of usage message in option parser. ---------------------------- revision 1.105 date: 2007/07/16 13:59:26; author: ianmacd; state: Exp; lines: +7 -7 Change source-info-name to remove reference to Chello in accordance with UPC's abandonment of the brand name. ---------------------------- revision 1.104 date: 2007/07/15 00:41:21; author: ianmacd; state: Exp; lines: +29 -5 We now trap ^C at the command line to avoid displaying the call stack as we exit. We now allow debugging (normally turned on with --debug) to be toggled by sending the process a SIGUSR1. We now allow verbosity (normally turned on with --verbose) to be toggled by sending the process a SIGUSR2. ---------------------------- revision 1.103 date: 2007/07/14 00:52:07; author: ianmacd; state: Exp; lines: +6 -5 The name Chello seems to be on the way out at UPC, so switch the base URL from http://www.chello.nl to http://epg.upc.nl. ---------------------------- revision 1.102 date: 2007/07/14 00:42:53; author: ianmacd; state: Exp; lines: +7 -6 Don't split a programme title on the colon to form a subtitle if the colon looks like it is a time separator, i.e. HH:MM. ---------------------------- revision 1.101 date: 2007/07/13 08:40:11; author: ianmacd; state: Exp; lines: +8 -8 Reuse of a variable name caused unthreaded mode to crash if a TV programme had parsable presenter names. ---------------------------- revision 1.100 date: 2007/07/11 17:42:00; author: ianmacd; state: Exp; lines: +28 -15 New method pre_checks performs pre-execution sanity checks requested by --sanity-check. New sanity check aborts program if we're running with an effective UID of 0. ---------------------------- revision 1.99 date: 2007/07/11 12:05:15; author: ianmacd; state: Exp; lines: +138 -125 More consistent use of quotes and % operator for string interpolation. Better reporting of what was fetched, as reporting is non-linear in threaded mode. ---------------------------- revision 1.98 date: 2007/07/10 22:51:33; author: ianmacd; state: Exp; lines: +110 -92 New option --threads causes one thread per channel to be used for fetching programme data. This is heavy on network and server resources, but causes the program to execute in a fraction of the time required in unthreaded mode. ---------------------------- revision 1.97 date: 2007/07/10 22:40:17; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.9. ---------------------------- revision 1.96 date: 2007/07/10 13:04:02; author: ianmacd; state: Exp; lines: +20 -7 Short-circuit use of complex regex for extracting a program subtitle from the description, when the description cannot possibly contain one. The following program description was found to cause exponential backtracking when trying to determine a subtitle: 1300BST: US PGA Tour Golf, 1400BST: Challenge Series Golf, 1530BST: WTA Tennis, 1600BST: ICC Cricket, 1630BST: ATP Tennis. The problem was severe enough that this description would cause the program to loop for hours within the subtitle matching regex. ---------------------------- revision 1.95 date: 2007/07/10 11:11:33; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.8. ---------------------------- revision 1.94 date: 2007/06/30 13:32:36; author: ianmacd; state: Exp; lines: +27 -10 Create ~/.xmltv if it doesn't already exist. Otherwise, --configure will cause an error when the file comes to be written. Likewise, IMDB ratings files would be unable to be written. If --config-file is used, we may also need to create the directory path to the named file. --config-file did not properly expand its parameter. When the config file did not exist or there were no channels defined in it, the resulting error message would, itself, produce an error due to an incorrect variable name. ---------------------------- revision 1.93 date: 2007/06/15 15:10:09; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.7. ---------------------------- revision 1.92 date: 2007/06/13 00:29:51; author: ianmacd; state: Exp; lines: +33 -12 Improved some of the text messages. Better reporting of successful IMDB look-ups: --verbose now also displays the rating found. IMDB ratings cache entries now have their creation time stored along with their last hit time. ---------------------------- revision 1.91 date: 2007/05/11 00:38:15; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.6. ---------------------------- revision 1.90 date: 2007/05/11 00:37:58; author: ianmacd; state: Exp; lines: +3 -3 Rating did not receive suffix of /10 when --static-ratings was used. ---------------------------- revision 1.89 date: 2007/05/08 17:39:08; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.5. ---------------------------- revision 1.88 date: 2007/05/08 17:38:23; author: ianmacd; state: Exp; lines: +9 -6 Prevent infrequent case of division-by-zero error when reporting IMDB look-up percentages. ---------------------------- revision 1.87 date: 2007/04/26 14:01:10; author: ianmacd; state: Exp; lines: +3 -4 Bump version to 0.8.4. ---------------------------- revision 1.86 date: 2007/04/25 20:51:01; author: ianmacd; state: Exp; lines: +9 -9 Fixed bug whereby reporting of percentages of looked-up vs. cached IMDB ratings could add up to 101%. This was due to a rounding error, which occurred when the mantissa of both percentages was .5, causing them to both be rounded upwards. ---------------------------- revision 1.85 date: 2007/04/12 09:15:31; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.3. ---------------------------- revision 1.84 date: 2007/04/12 09:15:10; author: ianmacd; state: Exp; lines: +10 -7 Count of cached entries when reading ~/.xmltv/ratings_cache.yaml was including expired entries. ---------------------------- revision 1.83 date: 2007/04/09 21:28:58; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.8.2. ---------------------------- revision 1.82 date: 2007/04/09 13:27:45; author: ianmacd; state: Exp; lines: +9 -5 Update cache entry timestamp when a cache entry is accessed. This prevents recurring negative entries for bogus film titles like 'Channel Off Air' from ever expiring, which is a good thing. ---------------------------- revision 1.81 date: 2007/04/06 18:43:01; author: ianmacd; state: Exp; lines: +21 -17 Rewrite deletion of expired ratings cache entries. When loading ratings cache, report how many entries are positive and how many are negative. ---------------------------- revision 1.80 date: 2007/04/04 12:40:58; author: ianmacd; state: Exp; lines: +12 -15 Condense rating statistics somewhat. ---------------------------- revision 1.79 date: 2007/04/03 22:50:52; author: ianmacd; state: Exp; lines: +156 -43 Dynamically looking up ratings in IMDB now makes use of a persistent cache, which is written out to ~/.xmltv/ratings_cache.yaml. Dynamic look-ups are therefore now semi-static. Entries expire after seven days, so one can now make use of dynamic look-ups whilst dramatically reducing the traffic sent to IMDB. In accordance with the new functionality described above, end-reporting of dynamic ratings with --verbose is now much better, detailing how many look-ups were attempted, how many succeeded, how many of the successes and failures were respectively positive and negative cache hits, etc. The code has been cleaned up a bit, with calls to Time.now replaced by a constant when an up-to-date value since being run isn't required. ---------------------------- revision 1.78 date: 2007/03/31 12:26:08; author: ianmacd; state: Exp; lines: +5 -4 Bump version to 0.8.1 ---------------------------- revision 1.77 date: 2007/03/30 14:25:53; author: ianmacd; state: Exp; lines: +7 -6 Remove external dependencies on UNIX date(1). ---------------------------- revision 1.76 date: 2007/03/23 16:13:09; author: ianmacd; state: Exp; lines: +11 -3 Bump to 0.8.0. ---------------------------- revision 1.75 date: 2007/03/22 10:23:12; author: ianmacd; state: Exp; lines: +3 -3 Sort channels and theme list printed by --debug case-insensitively. ---------------------------- revision 1.74 date: 2007/03/21 16:24:04; author: ianmacd; state: Exp; lines: +8 -2 When --static-ratings is used with --verbose, display the number of ratings read. ---------------------------- revision 1.73 date: 2007/03/21 16:18:39; author: ianmacd; state: Exp; lines: +13 -5 --static-ratings was not finding a large number of the films that it should have, due to a faulty regex and lack of case-insensitive matching. ---------------------------- revision 1.72 date: 2007/03/21 15:27:48; author: ianmacd; state: Exp; lines: +130 -42 New option --static-ratings offers the ability to use IMDB for film ratings, but in accordance with the policy laid down here: http://www.imdb.com/help/show_leaf?usedatasoftware Consequently, a local ratings file is downloaded from ftp.funet.fi (by the new class method Rating.get_ratings_list) and placed in ~/.xmltv, where it is gunzipped and used for rating look-ups. The use of this file is described here: http://www.imdb.com/interfaces#plain The file is downloaded when is does not already exist AND when it's older than seven days (as determined by its mtime). --static-ratings probably won't work on most non-UNIX-like systems, because gunzip is needed to decompress the ratings file. Ratings are now cached using the new class method, Rating.cache_rating. ---------------------------- revision 1.71 date: 2007/03/20 18:32:32; author: ianmacd; state: Exp; lines: +18 -14 Remove warning when user tries to use schema 1. Issue warning about incorrect locale only when --quiet isn't used. When --days > 8, the warning that is issued now comes after the check for useable channels in the config file. --ratings now issues a warning about IMDB policy violation, as defined here: http://imdb.com/help/show_leaf?usedatasoftware ---------------------------- revision 1.70 date: 2007/03/05 15:25:35; author: ianmacd; state: Exp; lines: +5 -5 IMDB has slightly altered its title pages again, so look-ups were failing. Bump version to 0.7.2. ---------------------------- revision 1.69 date: 2007/02/19 23:10:07; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.7.1. ---------------------------- revision 1.68 date: 2007/02/19 23:09:50; author: ianmacd; state: Exp; lines: +4 -4 IMDB has redesigned its title pages, so look-ups were failing. ---------------------------- revision 1.67 date: 2007/02/16 19:33:22; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.7.0. ---------------------------- revision 1.66 date: 2007/02/16 13:58:53; author: ianmacd; state: Exp; lines: +118 -6 Added the 'manualconfig' capability, via the new option --configure and the new methods configure_grabber and get_channel_number. Channel numbers in the config file may now be preceded by the string 'channel '. This is treated case-insensitively. ---------------------------- revision 1.65 date: 2007/02/15 10:00:39; author: ianmacd; state: Exp; lines: +16 -9 Improve IMDB rating look-ups by working around ampersand entities in the title. ---------------------------- revision 1.64 date: 2007/02/14 00:23:51; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.6.1. ---------------------------- revision 1.63 date: 2007/02/14 00:22:29; author: ianmacd; state: Exp; lines: +21 -5 When looking up movie titles in IMDB, care must be taken with titles containing accented letters, as the ensuing screen-scraping currently fails. This is due to the accented letters in the title not matching the replacement HTML entities in the page. We therefore need to examine the data we receive from UPC and detect titles with UTF-8 accented letters. We convert these to Latin-1 and then render the matching of any non-alphanumeric characters optional. Finally, any accented letters are replaced by a regex that will match the equivalent HTML entity, whether it be numeric or alphabetic. ---------------------------- revision 1.62 date: 2007/02/13 00:48:42; author: ianmacd; state: Exp; lines: +7 -3 Don't convert apostrophes etc. in presenter names to entities on output. ---------------------------- revision 1.61 date: 2007/02/13 00:36:08; author: ianmacd; state: Exp; lines: +97 -205 Remove the last vestiges of schema 1. ---------------------------- revision 1.60 date: 2007/02/12 23:10:43; author: ianmacd; state: Exp; lines: +6 -6 When schema 2 turned up no programmes, schema 1 was erroneously still used for a refetch. ---------------------------- revision 1.59 date: 2007/02/12 16:15:33; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.6.0. ---------------------------- revision 1.58 date: 2007/02/12 01:06:50; author: ianmacd; state: Exp; lines: +10 -8 Warn in help text that --schema is no longer effective. ---------------------------- revision 1.57 date: 2007/02/11 19:02:38; author: ianmacd; state: Exp; lines: +4 -3 Catch Errno::ECONNRESET exceptions when fetching pages. ---------------------------- revision 1.56 date: 2007/02/11 18:03:38; author: ianmacd; state: Exp; lines: +9 -7 UPC seem to have abandoned their original URL scheme, so we now issue a warning if the user runs the program with --schema 1 and force the schema to be 2. We also no longer retry empty schema 2 fetches using schema 1. ---------------------------- revision 1.55 date: 2007/02/11 17:21:38; author: ianmacd; state: Exp; lines: +15 -11 When trying to obtain a film rating, use fuzzier matching on the title by making any non-alphanumeric characters optional. For example, this allows 'Mrs. Henderson' to match 'Mrs Henderson'. ---------------------------- revision 1.54 date: 2007/02/11 16:11:29; author: ianmacd; state: Exp; lines: +79 -77 New category mappings: Actie => Action Historisch => History Removed a lot of superfluous whitespace. ---------------------------- revision 1.53 date: 2007/02/11 13:36:48; author: ianmacd; state: Exp; lines: +167 -19 --[no-]ratings is a new option for obtaining film ratings from IMDB. A programme is judged to be a film when it's duration is between 80 minutes and 4 hours, and its genre is likely that of a film. New Rating class for dealing with programme ratings. The class method Rating::imdb_rating obtains film ratings for a given title from IMDB. Both positive and negative look-ups are cached to reduce network traffic and allow the programme to run as fast as possible. The get_page method now follows HTTP 3xx redirections, as these are sometimes given by IMDB (when only one match exists for a given title). --debug will inform you when a redirect is being followed. When --verbose is used, the number of ratings for each day per channel, each channel, and the entire program run is displayed. Furthermore, the name of each programme will be displayed as we attempt to rate it, as well as whether or not a rating was fetched or found in the cache. ---------------------------- revision 1.52 date: 2007/02/05 12:30:10; author: ianmacd; state: Exp; lines: +3 -3 Double quotes, not single, are needed here. ---------------------------- revision 1.51 date: 2007/02/03 01:08:31; author: ianmacd; state: Exp; lines: +3 -3 An 'exit' command was still commented out for debugging purposes. ---------------------------- revision 1.50 date: 2007/02/03 00:36:06; author: ianmacd; state: Exp; lines: +16 -5 When using --verbose AND --debug, the channel/theme list obtained from UPC is now displayed. The presenter section of the credits section was not properly detecting presenters in English language descriptions. Presenter names containing an apostrophe, e.g. Conan O'Brien, were erroneously being detected as two presenters, e.g. Conan and Brien. More presenters are now detected, by additionally looking for the string 'hosted by' in programme descriptions. ---------------------------- revision 1.49 date: 2007/01/24 14:31:56; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.5.1. ---------------------------- revision 1.48 date: 2007/01/24 14:26:29; author: ianmacd; state: Exp; lines: +4 -4 Programme start and stop times now have their time zones expressed as an hour offset from GMT, in accordance with http://www.xmltv.org/wiki/xmltvcapabilities.html. CET and CEST are not allowed. ---------------------------- revision 1.47 date: 2007/01/24 13:51:08; author: ianmacd; state: Exp; lines: +3 -3 ENV['LANG'] can be nil, as well as ''. ---------------------------- revision 1.46 date: 2007/01/24 13:47:52; author: ianmacd; state: Exp; lines: +4 -3 Locale warning contained a blank if $LANG was unset. This has been corrected. ---------------------------- revision 1.45 date: 2007/01/24 13:40:48; author: ianmacd; state: Exp; lines: +26 -10 Add --description and --capabilities, according to http://www.xmltv.org/wiki/xmltvcapabilities.html. ---------------------------- revision 1.44 date: 2007/01/12 20:29:13; author: ianmacd; state: Exp; lines: +45 -27 Handle exceptions that occur when trying to get the channel list from UPC, plus those that occur when we do the sample Nederland 1 page fetch. get_page() is the new method that does all of the page fetching. ---------------------------- revision 1.43 date: 2007/01/02 07:55:37; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.5.0. ---------------------------- revision 1.42 date: 2007/01/02 07:48:56; author: ianmacd; state: Exp; lines: +13 -11 When parsing for subtitles, a sentence may (erroneously, obviously) end with more than one punctuation character. We now catch this. When parsing for subtitles, a subtitle is judged the same as the title (and therefore removed) if it differs only in having a full-stop at the end. Catch Errno::ECONNREFUSED exceptions whilst fetching pages. Better exception reporting when we fail to fetch a page. ---------------------------- revision 1.41 date: 2007/01/01 21:26:24; author: ianmacd; state: Exp; lines: +6 -7 Remove unhelpful 'Niet beschikbaar' text at an earlier stage. ---------------------------- revision 1.40 date: 2006/12/31 06:23:18; author: ianmacd; state: Exp; lines: +25 -16 Allow colons when trying to derive a subtitle. Single letter words should be allowed in subtitles. Allow &, plus some articles, conjunctions and prepositions in subtitles. ---------------------------- revision 1.39 date: 2006/12/30 23:41:42; author: ianmacd; state: Exp; lines: +16 -16 Remove unnecessary whitespace. ---------------------------- revision 1.38 date: 2006/12/30 23:39:23; author: ianmacd; state: Exp; lines: +63 -11 Use /x regular expression formatting to make the regex for deriving a subtitle from the description more legible. (?:\xC3[\x80-\xBF] must be added to subtitle regex, because UPC TV guide pages are returned as UTF-8 and accented alphabetic characters are double byte. When --sanity-check is used, an additional check is made to see if we are running in the nl_NL locale. If not, a warning is issued. ---------------------------- revision 1.37 date: 2006/12/30 06:47:34; author: ianmacd; state: Exp; lines: +63 -13 An effort is now made to derive a suitable subtitle for each programme. If the programme's title contains one or more colons, we split on the first one. The left-hand side becomes the title, the right-hand side the subtitle. If that fails, we look to see whether the first sentence of the programme's description contains exclusively words that begin with a capital letter (digits and punctuation are also allowed). If so, we assume that it's actually an episode title and use that as the subtitle. The string is then removed from the description. If the subtitle happens to be the same string as the title, we abandon it. If the description consists of only 'Niet beschikbaar', we abandon it. ---------------------------- revision 1.36 date: 2006/12/28 15:07:44; author: ianmacd; state: Exp; lines: +31 -2 Use source-info-url, source-info-name and generator-info-url attributes in root tv tag to provide more information about our origin. Make an effort to include a basic credits section, by isolating the presenters of a programme, if the information is available. ---------------------------- revision 1.35 date: 2006/12/27 16:22:18; author: ianmacd; state: Exp; lines: +3 -2 New category translation: 'Talkshow' => 'Talk' ---------------------------- revision 1.34 date: 2006/12/14 01:07:49; author: ianmacd; state: Exp; lines: +3 -3 Update to 0.4.0. ---------------------------- revision 1.33 date: 2006/12/13 15:52:39; author: ianmacd; state: Exp; lines: +3 -2 New category translation: 'Musical' => 'Movies' ---------------------------- revision 1.32 date: 2006/12/13 02:45:48; author: ianmacd; state: Exp; lines: +29 -11 New option, --[no-]cattrans. cattrans is the default. If no-cattrans is used, programme category translation will not take place. Programme category translation is only useful in combination with MythTV and it turns out that some people are using tv_grab_nl_upc for other purposes, so logically this option is needed. ---------------------------- revision 1.31 date: 2006/12/12 20:31:52; author: ianmacd; state: Exp; lines: +4 -3 New category translation: 'Theater / dans' => 'Arts/Culture' ---------------------------- revision 1.30 date: 2006/12/11 01:45:51; author: ianmacd; state: Exp; lines: +3 -3 Catch EOFError exceptions when doing HTTP traffic. ---------------------------- revision 1.29 date: 2006/12/11 01:45:08; author: ianmacd; state: Exp; lines: +5 -5 Also print channel when displaying total number of programmes found for each channel. ---------------------------- revision 1.28 date: 2006/12/05 20:12:51; author: ianmacd; state: Exp; lines: +11 -2 --verbose will now also display the total running time and number of page fetches on exit. ---------------------------- revision 1.27 date: 2006/12/05 12:18:21; author: ianmacd; state: Exp; lines: +4 -4 Fix display bug when sample programma data fetch of NED 1 returns fewer than 5 programmes. ---------------------------- revision 1.26 date: 2006/12/02 19:03:49; author: ianmacd; state: Exp; lines: +11 -5 Displaying missing programme categories is only really useful for the programme author (i.e. me). Therefore, this report is now generated by the new --debug option, no longer by --verbose. ---------------------------- revision 1.25 date: 2006/12/02 12:01:48; author: ianmacd; state: Exp; lines: +6 -4 New category translations: Algemeen -> Misc Klussen -> HowTo Fixed category translations: Tuinieren -> HowTo (was: Educational) ---------------------------- revision 1.24 date: 2006/12/02 11:52:50; author: ianmacd; state: Exp; lines: +13 -7 The missing category report will now detail the programmes whose category was not recognised. This will aid in adding new categories to the code. ---------------------------- revision 1.23 date: 2006/11/29 20:49:03; author: ianmacd; state: Exp; lines: +22 -13 Add --tries option to allow the user to determine the number of HTTP requests we attempt for each page. The default is 3. As a consequence, get_tvguide() is back to taking 4 parameters. ---------------------------- revision 1.22 date: 2006/11/29 20:31:43; author: ianmacd; state: Exp; lines: +30 -10 Try to get each page a maximum of three times, catching HTTP timeouts that may occur. get_tvguide() now takes a fifth parameter, the status of options.quiet. ---------------------------- revision 1.21 date: 2006/11/28 16:31:07; author: ianmacd; state: Exp; lines: +5 -5 Strip trailing whitespace from channel name when reading config. ---------------------------- revision 1.20 date: 2006/11/28 14:11:46; author: ianmacd; state: Exp; lines: +11 -8 Missing data report should contain channel numbers as well as names. ---------------------------- revision 1.19 date: 2006/11/28 00:26:07; author: ianmacd; state: Exp; lines: +7 -4 Take into account 24 hour continuous programmes like the data display on the Weerkanaal. These are usually denoted as programme entries that run from midnight to midnight, which look like programmes with a 0 minute running time. ---------------------------- revision 1.18 date: 2006/11/28 00:03:14; author: ianmacd; state: Exp; lines: +4 -4 Display channel names, not numbers, in missing data report. ---------------------------- revision 1.17 date: 2006/11/26 00:31:45; author: ianmacd; state: Exp; lines: +25 -15 Avoid exception caused by failure to find channel icon path. ---------------------------- revision 1.16 date: 2006/11/24 00:48:35; author: ianmacd; state: Exp; lines: +6 -6 Update to 0.3.0. ---------------------------- revision 1.15 date: 2006/11/23 00:30:04; author: ianmacd; state: Exp; lines: +363 -50 * New --schema option to select which chello.nl URL tree to pull data from. Previous versions used 1: http://www.chello.nl/Entertainment/TVGids/singlechannel/x/y/allday where x is the channel number as y is the day number for which we want the programmes, with 0 being today. As of now, we also offer 2: http://www.chello.nl/Entertainment/TV_gids/Zenders/Algemeen/Gids/?channels=x×cope=y where x is the channel name (URL encoded, of course) and y is the day name for which we want the programmes, with the suffix _all appended. Today's programmes use 'today_all', tomorrow's use 'tomorrow_all', but days after that use 'monday_all', 'tuesday_all', etc. * Vastly expanded category translation table to cope with more detailed categories offered by URL schema 2. * By default, schema 2 is now used for fetching data, because it provides more detailed programme categories. Otherwise, the data is more or less the same as that obtained from schema 1. * Parts of the code have now been separated into methods to improve readability. These are get_available_channels, check_channel, read_config, get_tvguide, get_programmes and clean. * Because schema 2 uses channel names rather than numbers, the name of the channel given in the config file must match exactly that used by UPC within the chello.nl site. For this reason, if schema 2 is used in combination with --sanity and --verbose, the program will pull the entire list of available UPC channels from http://www.chello.nl/cgi-bin/WebObjects/EPG.woa/wa/Events/?country=nl&template=Json_channelsGenres and check the channel names in the config file against this, making suggestions if certain channels cannot be matched. * The program will now abort if a sample page fetch for Nederland 1 for day 0 (today) yields fewer than 5 programmes. This would indicate a severe guide failure. * Channels are now processed in numeric order, starting with the lowest. The order was previously unpredictable. * --verbose will now report the number of programmes found per channel per day, as well as the total number for each channel, plus the total for all channels. * We now report any unknown programme categories found in the guide when --verbose is used. * --verbose will now print a report, containing details of which days contained no data for certain channels. If a channel yielded no programmes on any day, this fact will be emphasised. * If a schema yields no programmes for a channel on a certain day, we retry using the other schema. This is reported when --verbose is used. ---------------------------- revision 1.14 date: 2006/11/19 18:24:47; author: ianmacd; state: Exp; lines: +13 -15 Simplify screen-scraping, so that we just use String#scan to do all of the work. ---------------------------- revision 1.13 date: 2006/11/19 18:09:00; author: ianmacd; state: Exp; lines: +18 -18 Use puts instead of printf in most cases. ---------------------------- revision 1.12 date: 2006/11/19 00:14:43; author: ianmacd; state: Exp; lines: +7 -5 Off-by-one error in missing programme data detection. ---------------------------- revision 1.11 date: 2006/11/18 21:58:30; author: ianmacd; state: Exp; lines: +5 -6 Another bug in the missing data report. ---------------------------- revision 1.10 date: 2006/11/18 20:24:30; author: ianmacd; state: Exp; lines: +4 -4 The previous fix should have used printf, not puts. ---------------------------- revision 1.9 date: 2006/11/18 20:22:17; author: ianmacd; state: Exp; lines: +4 -4 Accidental use of abort instead of exit. ---------------------------- revision 1.8 date: 2006/11/18 19:52:30; author: ianmacd; state: Exp; lines: +8 -6 The check for yesterday's guide did not work. This has been fixed. When the guide is neither yesterday's nor today's, the error message now includes the date string from the guide, as this will aid troubleshooting. The missing programme data report produced by --verbose would print some headings, even when there was no missing data. This has been fixed. ---------------------------- revision 1.7 date: 2006/11/18 14:06:13; author: ianmacd; state: Exp; lines: +47 -2 If --verbose is used, the program will now produce a report at exit time, detailing which days had channels with no programme data and which of these channels had no data on any day. ---------------------------- revision 1.6 date: 2006/11/12 13:50:33; author: ianmacd; state: Exp; lines: +3 -3 Bump version to 0.2.0. ---------------------------- revision 1.5 date: 2006/11/12 13:48:15; author: ianmacd; state: Exp; lines: +13 -11 Do a better job of translating programme categories to what MythTV expects. ---------------------------- revision 1.4 date: 2006/09/18 16:14:14; author: ianmacd; state: Exp; lines: +25 -11 Added --sleep SECS option to sleep after each page fetch. The default amount of time is 1.0 seconds. ---------------------------- revision 1.3 date: 2006/09/18 15:51:10; author: ianmacd; state: Exp; lines: +45 -18 Added --xmltvid-suffix to add a string other than '.chello.nl' to the channel number to form the XMLTV ID. Added --version for displaying the program version. Improve usage message by giving default values. ---------------------------- revision 1.2 date: 2006/09/18 13:14:43; author: ianmacd; state: Exp; lines: +83 -25 Added --sanity-check option. If this is used, sanity checks will be made before pulling guide data. Currently, the only check is to ascertain that the guide is the correct one for the day at run-time. If running shortly after midnight, for example, it's possible that the guide will still be for yesterday. If that's the case, offset adjustments are made and the correct guide data is still pulled, unless the program is run after 05:00, by which time guide rotation really should have occurred. In that case, we abort. Even with the offset adjustments, however, we will still have a big problem if the guide happens to be rotated _during_ execution. It's best to avoid this race condition entirely by running the program later in the day. If the guide has a discrepancy of more than 1 day, we abort, as there's no good explanation for this. ---------------------------- revision 1.1 date: 2006/09/17 16:16:25; author: ianmacd; state: Exp; branches: 1.1.1; Initial revision ---------------------------- revision 1.1.1.1 date: 2006/09/17 16:16:25; author: ianmacd; state: Exp; lines: +0 -0 Create ruby-xmltv repo. Version 0.1.0 of grabber. =============================================================================