Commit graph

68 commits

Author SHA1 Message Date
yggverse
982d1f1246 update yggverse/net version 2024-04-06 02:42:41 +03:00
yggverse
8c9ef3bd5d fix status code reset #13 2024-04-03 18:21:00 +03:00
yggverse
c492a98094 use yo-tools-php library 2024-04-03 18:18:12 +03:00
yggverse
488e090f97 compare compressed snap file size instead of document download size 2024-04-01 15:55:00 +03:00
yggverse
f8c8aacf95 fix size data type conversion 2024-04-01 15:48:39 +03:00
yggverse
231b7be50d remove debug 2024-03-28 22:01:34 +02:00
yggverse
3fbde71313 fix variable name 2024-03-28 21:59:01 +02:00
yggverse
22156d230d fix snaps filename selection 2024-03-28 21:42:37 +02:00
yggverse
c618714cd2 fix regex rule 2024-03-27 05:10:01 +02:00
yggverse
21c6eb18dc fix variable name 2024-03-27 05:02:14 +02:00
yggverse
d07025e5ee fix str_starts_with attribute 2024-03-27 05:00:47 +02:00
yggverse
a3f2ab0aa2 use document ID as the snap location 2024-03-27 04:57:35 +02:00
yggverse
27564c4fbc add collision events debug 2024-03-27 04:27:49 +02:00
yggverse
5705f452cc update config folding 2024-03-24 18:26:59 +02:00
yggverse
d44bf90fe3 stop crawler on network connection lost #11 2024-03-24 18:15:26 +02:00
yggverse
b475b4e61b remove mime update on progress function execute #10 2024-03-23 16:21:57 +02:00
yggverse
686479e7f1 disable ranked pages index first 2024-03-23 15:55:09 +02:00
yggverse
0872e66e15 remove global constant declaration 2024-03-23 15:47:59 +02:00
yggverse
7cf10079c6 update mime on progress function event 2024-03-23 03:31:27 +02:00
yggverse
3a28bf5967 reset index time 2024-03-23 03:26:25 +02:00
yggverse
722de9175a reset index time 2024-03-23 03:25:36 +02:00
yggverse
62149220b9 update http code even progress function fails 2024-03-23 03:16:01 +02:00
yggverse
34fe26fcf9 disable document autodelete 2024-03-23 03:15:01 +02:00
yggverse
c4df3f3237 improve notice level debug 2024-03-23 01:00:49 +02:00
yggverse
3a9efeabc5 add snaps update by timeout feature 2024-03-23 00:47:08 +02:00
yggverse
ebeef559ba rename index action dependencies 2024-03-22 23:50:39 +02:00
yggverse
fef2b1abec implement reindex by request feature 2024-03-22 22:50:52 +02:00
yggverse
fae43d54e5 enable xhtml parser 2024-03-22 19:11:27 +02:00
yggverse
f2dbd1599c fix tags replacement condition 2024-03-22 03:02:57 +02:00
yggverse
5e4494c9e8 use PHP 8 str_starts_with function 2024-03-21 18:47:11 +02:00
yggverse
900e3a453f Disable keywords collection from headers as body index enabled 2024-03-21 03:46:58 +02:00
yggverse
1f3ee435e9 fix custom encoding conversion 2024-03-21 03:38:46 +02:00
yggverse
e09440b44a strip code content 2024-03-21 00:38:24 +02:00
yggverse
b5cd219f47 strip css content from index 2024-03-21 00:34:25 +02:00
yggverse
3884f375d4 save document body text to index 2024-03-20 19:31:56 +02:00
ghost
1c2e8dafb2 collect keywords from document headers 2024-01-23 02:49:52 +02:00
ghost
cfbc84cbaf sort queue by rank asc 2024-01-23 02:19:35 +02:00
ghost
db9dc8d4ba force results to string 2024-01-23 01:55:28 +02:00
ghost
50dc9d315a add rank field 2024-01-22 22:56:36 +02:00
ghost
6f4abe4729 set crc32url as document id 2024-01-22 22:52:37 +02:00
ghost
93baed4b90 delete deprecated documents with HTTP code not 200 on second scan 2023-12-20 08:44:35 +02:00
ghost
33cc778999 crawl newest pages by rand in queue 2023-12-10 00:29:18 +02:00
ghost
35ad144a9e add stripos url rules for crawl snaps 2023-12-02 22:15:44 +02:00
ghost
0e06ff3c0f fix debug message 2023-12-02 21:18:57 +02:00
ghost
51d52dea7d fix destination name 2023-12-02 20:12:03 +02:00
ghost
87ca594860 add debug levels 2023-12-02 16:04:22 +02:00
ghost
33d657cb72 apply sleep on timeout value provided only 2023-12-02 15:03:51 +02:00
ghost
bc00f0c851 make tmp subfolders storage optimization 2023-12-02 14:39:11 +02:00
ghost
f613b44d3f disable sort by RAND() in crawler queue 2023-12-02 14:22:50 +02:00
ghost
d3f8d1c0e3 fix result output 2023-11-30 02:59:07 +02:00