| |||||||||||||||
![]() | ![]() | ![]() |
| |||
| Google stopped indexing my wikipedia mirror I recently integrated wikipedia with my site, using two approaches. One is linking individual wiki pages into my algebra modules. The links in those pages point to the real wikipedia, but javascript in them wuold direct the reader who clicks on them, to my site. This lets users who click on these links, to stay within one algebra module. I am not concerned about that case. The second is that I have a full crosslinked wikipedia mirror under one particular directory. I already get quite a few google directed hits to various pages there. However, I keep track of how many wikipedia pages googlebot is visiting, and it has not visited even a fraction of what is out there. At some point I fed google several big files with links to all articles, which it promptly read and even followed some (I think). At some point later, the visits stopped. The pages that google did read, are still visitable through search engines. I am talking about tens or hundreds of thousands of articles. Google indexed mere thousands. Full credit is given to wikipedia and I fully follow the GFDL license. My question is, is there something that prevents google from following up on this. Any ideas will be appreciated. The pages with links contain 5,000 links each, there are 289 such pages and the master list. i -- |
| |||
| Re: Google stopped indexing my wikipedia mirror Ignoramus29781 wrote: > I recently integrated wikipedia with my site uh huh, along with a bizzilion other folk. The problem is that Google does not really want to bother with all these Wikipedia mirrors so runs duplicate page algorithms. Maybe the indexer decided that your pages were just duplicates of other content and told the googlebot not to spider those links anymore. That would make sense to me as spidering and indexing pages that are of no benefit to searchers just wastes Google's resources. |
| |||
| Re: Google stopped indexing my wikipedia mirror On Sun, 08 May 2005 09:42:25 +0200, davidof <david.george@g-dumpthisbit-mail.com> wrote: > Ignoramus29781 wrote: >> I recently integrated wikipedia with my site > > uh huh, along with a bizzilion other folk. The problem is that Google > does not really want to bother with all these Wikipedia mirrors so runs > duplicate page algorithms. Maybe the indexer decided that your pages > were just duplicates of other content and told the googlebot not to > spider those links anymore. That would make sense to me as spidering and > indexing pages that are of no benefit to searchers just wastes Google's > resources. Surely, that makes sense. Possibly, it will happen sooner or later. I began to mirror individual wikipedia pages for math related content, to complement the math pages that I already had. Then I decided to mirror wikipedia in a SE friendly way, since all pieces were already in place. In any case, googlebot is back and is busy indexing my pages. It varies by day. I fully comply with the wikipedia license, giving credit, referring to GFDL etc. i |