Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+3 votes
1.7k views
in Q2A Core by
edited by

I have imported my old database into questions2answers but indexing takes about 5 to 10 DAYs, so I am searching for a solution for this problem.

Database:

Total questions: 416,613
From users: 114,345
From anonymous: 302,268
 
Total answers: 1,479,042
From users: 864,922
From anonymous: 614,120

 

Dedicated Server:

 

  • Intel Core i7-2600 Quad Core
  • 32 GB DDR3 RAM
  • 2x 3 TB SATA III HDD 

When the reindexing is running I have a CPU usage about 0-6% and memory usage about 5.6%.

I don't unserstand why the usage is not higher (when I make a mysqlcheck of the tables the cpu usage raises up to 100%.

I already emptied the tables from other performance improvements post (qa_words ...) to optimize the reindiexing.

Is there a possibility to speed up the reindexing or make it with an shellscript via another server, most of the time I get timeouts or my computer reboots while reindexing via the admin page.

Perhaps changing some values in the php.ini will help, just an idea?

Hope you can help me, thanks

 

Fabian

Q2A version: 1.6.2
by
1. How many records does the ^posts table have? It seems it is at least 2M
2. After running for so long, does the process finish successfully?
3. What do you mean to say your computer reboots? If that happens then there is an issue with your computer for sure
4. Before starting the reindexing process press F12 (usually the key for the browser's inspector) and check the network tab
   4.1. Do you get any error there?
   4.2. How many posts are processed per bunch of posts?
   4.3. How much does it take to process a bunch of posts?
5. Why are you reindexing and what result do you expect from doing that?
by
It seems that important (concrete) operation about your importing (re-indexing) is not described in your question. It may be difficult to get effective advice without these information.
by
edited by
1. How many records does the ^posts table have? It seems it is at least 2M: yes thats is correct.
2. After running for so long, does the process finish successfully? Yes if the browser window which has started the reindexing is still open.
3. What do you mean to say your computer reboots? If that happens then there is an issue with your computer for sure. Yes this is a problem of the computer (in fact of Updates or other things) but more than one week is a long time for reindexing. And if there is an error with the client computer I have to restart the full reindex, that costs a lot of time. Sometimes the browser gets an timeout...
4. Before starting the reindexing process press F12 (usually the key for the browser's inspector) and check the network tab
   4.1. Do you get any error there? No
   4.2. How many posts are processed per bunch of posts? 10-30
   4.3. How much does it take to process a bunch of posts? 1-4 sec
5. Why are you reindexing and what result do you expect from doing that?
I have to reindex because after the import of data we do not have these data in the table (to show related posts ...)

Perhaps there is a way for a increment indexing, one button for full reindex and one for indexing all not indexed posts (start after the last indexed post)
Or a way to start the indexing from the server itself (I don't run an x server on this machine) or via sql ...

3 Answers

+4 votes
by

I'm still not sure which particular step of the reindex content is actually taking more: the post reindex or the word count. I guess it is the former one.

If your server can handle it you could try processing in bigger bunches.

1. Locate this line and turn 60 into 90

2. Locate this line and turn 2 into 10

3. In order to increment the size of posts bunches, locate this line and increment the number 10 to 100 (you could even try higher numbers)

4. In order to increment the size of posts bunches, locate this line and increment the number 1000 to 10000 (you could even try higher numbers)

Retry the process and time it. Try to monitor server CPU and disk usage, if possible.

by
Hi pupi1985 and thanks for your answer,

I tried different numbers as you wrote but there is not really a performance improvement. The CPU and MEM usage is still between 2 and 5 % and it "feels" like the recalculating is not realy faster now.

Do you have any more suggestions?

Thanks for your answer
Fabian
by
Regardless of server usage, exactly how many hours did the process take to finish in the original approach and how many hours did it take after changing those numbers?
by
edited by
I can't tell it, because the last run was not ended till the last 4 days (with the original numbers). I think a complete run will take 7 or 8 days. I have startet a new run one hour ago but can't tell you how long it will run.
by
I tried it out a little bit but it seems that these numbers are not the bottleneck. I changed now the server to a server with a ssd as hard disk and this seems to increase the performance by the factor 4. On the old server with normal hard disk the reindexing will take 8 days on the new one it will take 1,5 to 2 days I think.
If you have any ideas to increase the performance I'd like to read them, thanks.
by
Very interesting. Hard drive issues. I wouldn't have expected that in 2015 :) Apart from changing these settings I can't think of anything else. I wonder how much difference it makes to run the process with the default values and with the "tuned" ones with the new hardware.

Also, if the process has already finished. Why are you caring about the speed? Are you planning to run it again? Why would you?
by
Sorry, this is a missunderstanding, it is not a Hard drive issue, it is the difference between an old magnetic hard drive and a new ssd hard drive.

The process is not already finished, my time estimations are only calculations. I never had a complete run, but hope now I will get one after two days.

All my different tests with different tunings showed me that there is no performance increase with the tuning. Also not with tuning the database itself with mysqltuner (play around with caches...). I tested a lot but nothing increased the performance of the reindexing in a "feelable" way.
by
Did you read pages below? When InnoDB buffer settings are small, CPU may not work effectively.
http://dev.mysql.com/doc/refman/5.0/en/innodb-configuration.html
https://dev.mysql.com/doc/refman/5.0/en/innodb-tuning.html
http://www.question2answer.org/qa/24755/database-error-in-1-6-beta-2
In addition, since re-indexing process on admin panel is based on Ajax, it will be slow. After increasing session timeout, it will be fast that you run special program to re-make word table.
+2 votes
by

i was facing the same issue i have 700,000 posts and always get time out and i had to start from the begining so idecided to do something

i have customoze the script to make it resume from where it was finished not from the start

so if ht escript timedout at post id =  50,000  next time it will start feom post id = 50,001

do make that happen you must do th follwing

  1. in your PHPmyadin
  2. choose the datbase
  3. choose qa_options
  4. click insert
  5. and add this to the title reindex_lastpostid
  6. open this file qa-include\app\recalc.php
  7. and from line 129 which start with case 'doreindexcontent_postreindex': 
  8. to line 148  contains $continue=true;
  9. replace with this code
            case 'doreindexcontent_postreindex':
                                $next = qa_opt('reindex_lastpostid');
                $posts=qa_db_posts_get_for_reindexing($next, 100);

                if (count($posts)) {
                    require_once QA_INCLUDE_DIR.'app/format.php';

                    $lastpostid=max(array_keys($posts));

                    qa_db_prepare_for_reindexing($next, $lastpostid);
                    qa_suspend_update_counts();

                    foreach ($posts as $postid => $post) {
                        qa_post_unindex($postid);
                        qa_post_index($postid, $post['type'], $post['questionid'], $post['parentid'], $post['title'], $post['content'],
                            $post['format'], qa_viewer_text($post['content'], $post['format']), $post['tags'], $post['categoryid']);
                    }

                    $next=1+$lastpostid;
                                        qa_opt('reindex_lastpostid', $next);
                    $done = qa_db_count_rest_of_posts($next);
                    $continue=true;

and open file qa-include\db\recalc.php

and add this to the end

    function qa_db_count_rest_of_posts()

    {
                @$next_post_id =  qa_opt('reindex_lastpostid');
        return qa_db_read_one_value(qa_db_query_sub(
            'SELECT COUNT(*) FROM qa_posts WHERE postid < #',
                        $next_post_id
                        
        ));
    }

when you upgrade make sure you edit these 2 files

i hope if someone can create a plugin instead of that custome code it will be greate

 

+2 votes
by

The fastest way to reindex all the content is to first remove everything from the qa_words and qa_contentwords tables. Step by step:

1. Under Admin > General tick "Take site down for temporary maintenance" and save. This way there is no problem with users adding posts while you are indexing.

2. Then go into your database and run these two queries:

TRUNCATE TABLE qa_contentwords;
TRUNCATE TABLE qa_words;

3. Then go to Admin > Stats and click the reindex button.

by
when i use cmd "TRUNCATE TABLE qtoaoi_words",it report  failed,how to fix?


#1701 - Cannot truncate a table referenced in a foreign key constraint (`zaizhyaq_ostack`.`qtoaoi_contentwords`, CONSTRAINT `qtoaoi_contentwords_ibfk_2` FOREIGN KEY (`wordid`) REFERENCES `zaizhyaq_ostack`.`qtoaoi_words` (`wordid`))
...