五种应该避免的代码注释

一、自恋型注释

(注:原文为Proud,我觉得“自恋”更好一点)

  1. public class Program  
  2. {  
  3.     static void Main(string[] args)  
  4.     {  
  5.         string message = "Hello World!";  // 07/24/2010 Bob  
  6.         Console.WriteLine(message); // 07/24/2010 Bob  
  7.         message = "I am so proud of this code!"// 07/24/2010 Bob  
  8.         Console.WriteLine(message); // 07/24/2010 Bob  
  9.     }  

 

原文:这样的程序员对于自己的代码改动非常骄傲和自恋,所以,他觉得需在在这些自己的代码上标上自己的名字。其实,一个版本控制工具(如:CVS或Subversion)可以完整地记录下所有的关于代码的改动的和作者相关的一切信息,只不过不是那么明显罢了。

笔者:我同意原文的观点。在我的团队里也有这样的事情发生。有段时间我认真思考过这样的事情,是否应该把这样的事情在代码中铲除出去。后来,我觉得,允许这样的行为并不一定是坏事,因为两点:

  1. 调动程序员下属的积极性可能更为重要。即然,这种方式可以让程序员有骄傲的感觉,能在写代码中找到成就感,为什么要阻止呢?又不是什么大问题。
  2. 调动程序员的负责任的态度。程序员敢把自己的名字放在代码里,说明他对这些代码的信心,是想向大家展示其才能。所以,他当然知道,如果这段他加入的代码有问题的话,他的声誉必然受到损失,所以,他敢这么干,也就表明他敢于对自己的代码全面的负责。这不正是我们所需要的?!

所以,基于上述考虑,我个人认为,从代码的技术角度上来说,这样的注释很不好。但从团队的激励和管理上来说,这样的方式可能也挺好的。所以,我并不阻止也不鼓励这样的注释。关键在于其是否能有更好的结果。

 

 

二、废弃代码的注释

 
  1. public class Program  
  2. {  
  3.     static void Main(string[] args)  
  4.     {  
  5.         /* This block of code is no longer needed 
  6.          * because we found out that Y2K was a hoax 
  7.          * and our systems did not roll over to 1/1/1900 */  
  8.         //DateTime today = DateTime.Today;  
  9.         //if (today == new DateTime(1900, 1, 1))  
  10.         //{  
  11.         //    today = today.AddYears(100);  
  12.         //    string message = "The date has been fixed for Y2K.";  
  13.         //    Console.WriteLine(message);  
  14.         //}  
  15.     }  

 

原文:如果某段代码不再使用了,那就应该直接删除。我们不应该使用注释来标准废弃的代码。同样,我们有版本控制工具来管理我们的源代码,在版本控制工具里,是不可能有代码能被真正的物理删除的。所以,你总是可以从以前的版本上找回你的代码的。

笔者:我非常同意这样的观点。只要你是废弃的,就应该是删除,而不是注释掉。注释并不是用来删除代码的。也许你 会争论到,在迭代开发中,你觉得被注释的代码很有可能在未来会被使用,但现在因为种种问题暂时用不到,所以,你先注释着,然后等到某一天再enable 它。所以你注释掉一些未来会有的程序。在这样的情况,你可以注释掉这段代码,但你要明白,这段代码不是“废弃”的,而是“临时”不用的。所以,我在这里提 醒你,请不要教条式地在你的程序源码中杜绝这样的注释形式,是否“废弃”是其关键。

 

 

三、明显的注释

  1. public class Program  
  2. {  
  3.     static void Main(string[] args)  
  4.     {  
  5.         /* This is a for loop that prints the 
  6.          * words "I Rule!" to the console screen 
  7.          * 1 million times, each on its own line. It 
  8.          * accomplishes this by starting at 0 and 
  9.          * incrementing by 1. If the value of the 
  10.          * counter equals 1 million the for loop 
  11.          * stops executing.*/  
  12.         for (int i = 0; i < 1000000; i++)  
  13.         {  
  14.             Console.WriteLine("I Rule!");  
  15.         }  
  16.     }  

 

原文:看看上面的例子,代码比注释还容易读。是的,大家都是程序员,对于一些简单的,显而易见的程序逻辑,不需要注释的。而且,你不需要在你的注释中教别人怎么编程,你这是在浪费时间去解释那些显而易见的东西。你应该用注释去解释你的代码功能,原因,想法,而不是代码本身。

笔者:非常同意。最理解的情况是你的代码写得直接易读,代码本身就是自解释的,根本不需要注释。这是最高境界。 注释应该说明下面的代码主要完成什么样的功能,为什么需要他,其主要算法怎么设计的,等等。而不是解释代码是怎么工作的。这点很多新手程序员都做得不够 好。别外,我需要指出的是,代码注释不宜过多,如果太多的话,你应该去写文档,而不是写注释了。

 

 

四、故事型注释

 
  1. public class Program  
  2. {  
  3.     static void Main(string[] args)  
  4.     {  
  5.        /* I discussed with Jim from Sales over coffee 
  6.         * at the Starbucks on main street one day and he 
  7.         * told me that Sales Reps receive commission 
  8.         * based upon the following structure. 
  9.         * Friday: 25% 
  10.         * Wednesday: 15% 
  11.         * All Other Days: 5% 
  12.         * Did I mention that I ordered the Caramel Latte with 
  13.         * a double shot of Espresso? 
  14.        */  
  15.         double price = 5.00;  
  16.         double commissionRate;  
  17.         double commission;  
  18.         if (DateTime.Today.DayOfWeek == DayOfWeek.Friday)  
  19.         {  
  20.             commissionRate = .25;  
  21.         }  
  22.         else if (DateTime.Today.DayOfWeek == DayOfWeek.Wednesday)  
  23.         {  
  24.             commissionRate = .15;  
  25.         }  
  26.         else  
  27.         {  
  28.             commissionRate = .05;  
  29.         }  
  30.         commission = price * commissionRate;  
  31.     }  

 

原文:如果你不得不在你的代码注释中提及需求,那也不应该提及人名。在上面的示例中,好像程序想要告诉其它程序员,下面那些代码的典故来自销售部的Jim,如果有不懂的,你可以去问他。其实,这根本没有必要在注释中提到一些和代码不相干的事。

笔者:太同意了。这里仅仅是代码,不要在代码中掺入别的和代码不相干的事。这里你也许会有以下的争辩:

  1. 有时候,那些所谓的“高手”逼着我这么干,所以,我要把他的名字放在这里让所有人看看他有多SB。
  2. 有时候,我对需求并不了解,我们应该放一个联系人在在这里,以便你可以去询问之。

对于第一点,我觉得这是一种情绪化。如果你的上级提出一些很SB的想法,我觉得你应该做的是努力去和他沟通,说明你的想法。如果这样都不行的话,你 应该让你的经理或是那个高手很正式地把他的想法和方案写在文档里或是电子邮件里,然后,你去执行。这样,当出现问题的时候,你可以用他的文档和邮件作为你 的免责证据,而不是在代码里干这些事。

对于第二点,这些需求的联系人应该是在需求文档中,如果有人有一天给你提了一个需求,你应该把其写在你的需求文档中,而不是你的代码里。要学会使用流程来管理你的工作,而不是用注释。

最后,关于故事型的注释,我需要指出也有例外的情况,我们团队中有人写注释喜欢在注释或文档里写一些名人名言(如 22条经典的编程引言编程引言补充Linus Torvalds 语录 Top 10 ),甚至写一些小笑话,幽默的短句。我并不鼓励这么做,但如果这样有利于培养团队文化,有利于让大家对工作更感兴趣,有利于大家在一种轻松愉快的环境下读/写代码,那不也是挺好的事吗?

另外,做为一个管理者,有时候我们应该去看看程序员的注释,因为那里面可能会有程序员最直实的想法和情绪(程序员嘴最脏??)。了解了他们的想法有利于你的管理。

 

 

五、“TODO”注释

 
  1. public class Program  
  2. {  
  3.     static void Main(string[] args)  
  4.     {  
  5.        //TODO: I need to fix this someday – 07/24/1995 Bob  
  6.        /* I know this error message is hard coded and 
  7.         * I am relying on a Contains function, but 
  8.         * someday I will make this code print a 
  9.         * meaningful error message and exit gracefully. 
  10.         * I just don’t have the time right now. 
  11.        */  
  12.        string message = "An error has occurred";  
  13.        if(message.Contains("error"))  
  14.        {  
  15.            throw new Exception(message);  
  16.        }  
  17.     }  

 

原文:当你在初始化一个项目的时候,TODO注释是非常有用的。但是,如果这样的注释在你的产品源码中出现了N多年,那就有问题了。如果有BUG要被fix,那就Fix吧,没有必要整一个TODO。

笔者:是的,TODO是一个好的标志仅当存在于还未release的项目中,如果你的软件产品都release 了,你的代码里还有TODO,这个就不对了。也许你会争辩说,那是你下一个版本要干的事。OK,那你应该使用项目管理,或是需求管理来管理你下一个版本要 干的事,而不是使用代码注释。通常,在项目release的前夕,你应该走查一下你代码中的TODO标志,并且做出决定,是马上做,还是以后做。如果是以 后做,那么,你应该使用项目管理或需求管理的流程。

上述是你应该避免使用的注释,以及我个人的一些观点,也欢迎你留下你的观点!

最佳编程语录

以前本站发布过《22条经典的编程引言》、《编程引言补充》、《Linus Torvalds 语录》还有《十条不错的编程观点》。今天向大家介绍“最佳编程语录”,条条都是很不错的语录,如同我们的太阳,照亮了我们的方向(所以我们选用了一个红色的图片,希望能够通过五毛们的网络审查)。其中只有一两条在以前本站发布过的文章中出现过。这篇文章的出处在这里,下面是“Neo”和“陈皓”的翻译,我们的翻译水平有限,所以,我们提供了中英文对照,有不当之处,还请各位指正。

A good programmer is someone who looks both ways before crossing a one-way street. — Doug Linder, systems administrator

好的程序员这样一类人,这类人在横穿一条单行道前都要先看一下路两边。– Doug Linder, 系统管理员

A most important, but also most elusive, aspect of any tool is its influence on the habits of those who train themselves in its use. If the tool is a programming language this influence is, whether we like it or not, an influence on our thinking habits. — Edsger Dijkstra, computer scientist

关于工具,一个最重要的,也是最不易察觉的方面是,工具对使用此工具的人的习惯的潜移默化的影响。如果这个工具是一门程序语言,不管我们是否喜欢它,它都会影响我们的思维惯式。 –Edsger Dijkstra, 著名的计算机科学家。

Being abstract is something profoundly different from being vague… The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise. — Edsger Dijkstra

抽象和模糊完全地不同,抽象的目的并不是把事情变模糊,而去创建一个新的语义层,在那里是绝对精确的描述。 — Edsger Dijkstra

Besides a mathematical inclination, an exceptionally good mastery of one’s native tongue is the most vital asset of a competent programmer. — Edsger Dijkstra

除了数学爱好,对于一个有能力的程序员来说,出色地掌握自己的母语是最宝贵的财富。– Edsger Dijkstra

C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg. — Bjarne Stroustrup, developer of the C++ programming language

C很容易使你搬起石头砸自己的脚,而C++把这事变得更难,但是如果一定要这么做,那么你的整条腿都会被炸飞 -Bjarne Stroustrup, C++语言的发明者

Commentary: most debugging problems are fixed easily; identifying the location of the problem is hard. — unknown

修复bug很容易,但是定位bug却很困难 – 匿名

Considering the current sad state of our computer programs, software development is clearly still a black art, and cannot yet be called an engineering discipline. — Bill Clinton, former President of the United States

看看当前计算机程序糟糕的事态,软件开发明显一直是一门妖术,其仍然不能被称为一个工程学。 –比尔.克林顿 美国前总统

For a long time it puzzled me how something so expensive, so leading edge, could be so useless, and then it occurred to me that a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are, in short, a perfect match. — Bill Bryson, author, from Notes from a Big Country

长期以来,有个事一直困扰着我,那就是越是昂贵的,越是前沿的,就越可能是没用的。然后,困扰我的另一个事是,计算机是一个死的机器,却可以不可思 议地去完成那些巧妙的事情,而计算机程序员是那么聪明人却在做着不可思议的愚蠢的事情,简而言之,他们真是天生的一对。– Bill Bryson旅游文学作家 Big Country中的笔记

Given enough eyeballs, all bugs are shallow (e.g., given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone). — Eric S. Raymond, programmer and advocate of open source software, from The Cathedral and the Bazaar

足够多的眼睛,就可让所有问题浮现(比如:只要给于足够多的beta测试者和开发人员一起工作,那么,几所所有的问题都会很快的出现,而修正也会是显而易见的)

Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’ Improve the code and then document it to make it even clearer. — Steve McConnell, software engineer and author, from Code Complete

好的代码自己本身就是最好的文档。当你打算加注释的时候,问问自己‘我如何才能把我的代码改善到不需增加注释?’重构自己的代码,然后使文档让其更清楚。 — Steve McConnell《代码大全》的作者

Hey! It compiles! Ship it! — unknown

嘿,编译通过了!出货!–匿名

Inside every well-written large program is a well-written small program. — Charles Antony Richard Hoare, computer scientist

在每个编写精良的大程序里面都是一个编写精良的小程序。 –Charles Antony Richard Hoare,计算机科学家

It should be noted that no ethically-trained software engineer would ever consent to write a DestroyBaghdad procedure. Basic professional ethics would instead require him to write a DestroyCity procedure, to which Baghdad could be given as a parameter. — Nathaniel S. Borenstein, computer scientist

需要注意的是,没有哪个经过规范培训的工程师会赞成写一个DestoryBaghdad(摧毁巴克达)的函数。最基本的职业规范会告诉他们应该去写 一个叫DestoryCity的函数,然后把“Baghdad”(巴克达)当成这个函数的参数。—— Nathaniel S. Borenstein, 计算机科学家

Managing programmers is like herding cats. — unknown

管理程序员就如同养一群猫一样 –匿名

Measuring programming progress by lines of code is like measuring aircraft building progress by weight. — Bill Gates, co-founder of Microsoft Corporation

用代码行数来衡量编程的进度,就如同用航空器零件的重量来衡量航空飞机的制造进度一样。——Bill Gates,微软创始人

More good code has been written in languages denounced as bad than in languages proclaimed wonderful — much more. — Bjarne Stroustrup, from The Design and Evolution of C++

更多的优秀代码是用被认为很烂的语言写成的,而不是用那些被说的好的不得了的语言。——Bjarne Stroustrup, 摘自《The Design and Evolution of C++》

Programs must be written for people to read, and only incidentally for machines to execute. — Harold Abelson and Gerald Jay Sussman, computer scientists and authors, from The Structure and Interpretation of Computer Programs

代码应该是写给其他人来读的,而能让机器运行的仅仅是附带着的。—— Harold Abelson 与 Gerald Jay Sussman, 计算机科学家和作家,摘自《The Structure and Interpretation of Computer Programs》

Real programmers don’t comment their code. If it was hard to write, it should be hard to understand. — unknown

真正程序员从来不写代码的注释,如果代码非常难写,那么同样代码的注释也会非常难懂 –匿名

Simplicity is prerequisite for reliability. — Edsger Dijkstra

简单是可靠的前提条件 — 迪杰斯特拉

The C programming language — a language which combines the flexibility of assembly language with the power of assembly language. — unknown

C语言—— 一门同时具有了汇编语言灵活性和汇编语言强大能力的语言。– 匿名

The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time. — Tom Cargill, object-oriented programming expert at Bell Labs

开始的90%的代码用了90%的开发时间,而剩下的最后的10%的代码会需要另外90%的开发时间。– Tom Cargill,贝尔实验室的面向对象编程专家。

The important point is that the cost of adding a feature isn’t just the time it takes to code it. The cost also includes the addition of an obstacle to future expansion. Sure, any given feature list can be implemented, given enough coding time. But in addition to coming out late, you will usually wind up with a codebase that is so fragile that new ideas that should be dead-simple wind up taking longer and longer to work into the tangled existing web. The trick is to pick the features that don’t fight each other. — John Carmack, computer game programmer

增加一个功能特性的成本并不单单是为这些功能编码所花费时间的成本,还这个成本应该包括特性扩展的障碍成本。当然,任何的功能清单都可以被实现,只 需要有足够的时间。但是除些之外,你应该对你的代码库的脆弱性感到紧张,而那些新的想法应该足够的简单,而不是去花费更多更多的时间去纠缠于现有的蜘蛛 网。这里的决窃是挑选那些不会和别人冲突的的功能。

The key to performance is elegance, not battalions of special cases. The terrible temptation to tweak should be resisted unless the payoff is really noticeable. — Jon Bently and M. Douglas McIlroy, both computer scientists at Bell Labs

表现的关键是精美和典雅的,并不是使用大量的特殊案例。对于任何调整的冲动都应该是被限制的,除非其回报真的是值得注意的。– Jon Bently and M. Douglas McIlroy, 二者都是贝尔试验实的计算机科学家

The last good thing written in C was Franz Schubert’s Symphony Number 9. — Erwin Dieterich, programmer
最后一件用C做的好作品就是弗朗茨.舒伯特的C大调第9交响曲 — Erwin Dieterich, programmer程序员

The problem with using C++ … is that there’s already a strong tendency in the language to require you to know everything before you can do anything. — Larry Wall, developer of the Perl language

使用C++最大的问题是..在C++语言里,存在这一种很强的趋势,就是如果你不明白C++语言的细节,你就无法做好任何事情。– Larry Wall, developer of the Perl language

The sooner you start to code, the longer the program will take. — Roy Carlson, University of Wisconsin

你越早开始都手编码,你所花费来编程的时间就越长 — Roy Carlson, University of Wisconsin

The value of a prototype is in the education it gives you, not in the code itself. — Alan Cooper, software author, from The Inmates are Running the Asylum

原型的价值在于他给你的教训,而不是代码自身 — Alan Cooper, software author, from The Inmates are Running the Asylum

There are only two kinds of programming languages: those people always bitch about and those nobody uses. — Bjarne Stroustrup

世界上只有两类编程语言:人们都抱怨的语言和从来没有人使用的语言 — Bjarne Stroustrup

There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. — Charles Antony Richard Hoare

世界上有两个设计软件的方法,一种方法是设计的尽量简单,以至于明显的没有什么缺陷,另外一种方式是使他尽量的复杂,以至于其缺陷不那么明显。

Ugly programs are like ugly suspension bridges: they’re much more liable to collapse than pretty ones, because the way humans (especially engineer-humans) perceive beauty is intimately related to our ability to process and understand complexity. A language that makes it hard to write elegant code makes it hard to write good code. — Eric S. Raymond

丑陋的程序就像一座丑陋的吊桥:他们相比漂亮的良好的吊桥起来,更有可能会坍塌,这是因为人类(尤其是工程师)感知漂亮的东西是和我们处理和理解复杂问题的能力相关的。所以,一个程序语言如果很难以优雅地方式编程,那么其就很难写出好的代码。

Weeks of programming can save you hours of planning. — unknown

多做几周的编程可以节省你做计划的时间 —— 匿名 (意思为,只有实践过了,你才更容易做计划,没有实践过,做起计划来将会很头痛)

When a programming language is created that allows programmers to program in simple English, it will be discovered that programmers cannot speak English. — unknown

当程序语言被设计成允许程序以很简单的英语来编程的时候,人们将会发现编写程序的程序员都来自不会说英语的地方。 –匿名

编程时间分配图

下面是一个程序员coding的时间分配图,原图在这里

编程时间分配图


思考会是一个很重要的过程,当然耽搁拖沓也有可能也是因为没有想好,抽烟/喝咖啡应该也是一种思考,吃点东西是为了让脑子转得更快一点,上网搜索一下灵感可以借鉴一下其它人的想法,抱怨写注释只是一个例子,更多的应该是抱怨加班或是公司的老板。

如果需要加上点什么的话,我觉得应该加点“重构”,“编译”,“调试”,当然,他们都可以算在coding里。不过,我觉得更应该还有:“开会”,“争吵/解释”,“打断”,这些比重也是很大的。

所以,下面是我个人认为比较实际的版本:

编程时间图(酷壳版)

你的编程时间分配图是怎么样的?

140个Google的面试题

来源:http://blog.seattleinterviewcoach.com/2009/02/140-google-interview-questions.html(墙)

某猎头收集了140多个Google的面试题,都张到他的Blog中了,主要是下面这些职位的,因为被墙,且无任何敏感信息,所以,我原文搬过来了。
  • Product Marketing Manager
  • Product Manager
  • Software Engineer
  • Software Engineer in Test
  • Quantitative Compensation Analyst
  • Engineering Manager
  • AdWords Associate

这篇Blog例举了Google用来面试下面这几个职位的面试题。很多不是很容易回答,不过都比较经典与变态,是 Google,Microsoft,Amazon之类的公司的风格。对于本文,我没有翻译,因为我相信,英文问题是最好的。不过对于有些问题,我做了一些 注释,不一定对,但希望对你有帮助启发。对于一些问题,如果你百思不得其解,可以Google一下,StackOverflow或是Wikipedia上 可能会给你非常全面的答案。

Product Marketing Manager
  • Why do you want to join Google?
  • What do you know about Google’s product and technology?
  • If you are Product Manager for Google’s Adwords, how do you plan to market this?
  • What would you say during an AdWords or AdSense product seminar?
  • Who are Google’s competitors, and how does Google compete with them?
  • Have you ever used Google’s products? Gmail?
  • What’s a creative way of marketing Google’s brand name and product?
  • If you are the product marketing manager for Google’s Gmail product, how do you plan to market it so as to achieve 100 million customers in 6 months?
  • How much money you think Google makes daily from Gmail ads?
  • Name a piece of technology you’ve read about recently. Now tell me your own creative execution for an ad for that product.
  • Say an advertiser makes $0.10 every time someone clicks on their ad. Only 20% of people who visit the site click on their ad. How many people need to visit the site for the advertiser to make $20?
  • Estimate the number of students who are college seniors, attend four-year schools, and graduate with a job in the United States every year.

Product Manager
  • How would you boost the GMail subscription base?
  • What is the most efficient way to sort a million integers?  (merge sort)
  • How would you re-position Google’s offerings to counteract competitive threats from Microsoft?
  • How many golf balls can fit in a school bus? (这种题一般来说是考你的解题思路的,注意,你不能单纯地把高尔夫球当成一个小立方体,其是一个圆球,堆起来的时候应该是错开的——也就是三个相邻 的球的圆心是个等边三角形)
  • You are shrunk to the height of a nickel and your mass is proportionally reduced so as to maintain your original density. You are then thrown into an empty glass blender. The blades will start moving in 60 seconds. What do you do?
  • How much should you charge to wash all the windows in Seattle?
  • How would you find out if a machine’s stack grows up or down in memory?
  • Explain a database in three sentences to your eight-year-old nephew. (用三句话向8岁的侄子解释什么是数据库,考你的表达能力了)
  • How many times a day does a clock’s hands overlap?(经典的时钟问题)
  • You have to get from point A to point B. You don’t know if you can get there. What would you do?
  • Imagine you have a closet full of shirts. It’s very hard to find a shirt. So what can you do to organize your shirts for easy retrieval? (很不错的一道题,不要以为分类查询很容易,想想图书馆图书的分类查询问题吧。另外,你处想想如何在你在你的衣柜里实现一个相当于Hash表或是一 个Tree之类的数据结构)
  • Every man in a village of 100 married couples has cheated on his wife. Every wife in the village instantly knows when a man other than her husband has cheated, but does not know when her own husband has. The village has a law that does not allow for adultery. Any wife who can prove that her husband is unfaithful must kill him that very day. The women of the village would never disobey this law. One day, the queen of the village visits and announces that at least one husband has been unfaithful. What happens? (这个问题很有限制级,哈哈,非常搞的一个问题,注意wife们的递归,这类的问题是经典的分布式通讯问题,上网搜 一搜吧。)
  • In a country in which people only want boys, every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?(第一反应是——这个国家是中国。一个概率问题,其实,无论你怎么生,50%的概率是永远不变的。)
  • If the probability of observing a car in 30 minutes on a highway is 0.95, what is the probability of observing a car in 10 minutes (assuming constant default probability)?
  • If you look at a clock and the time is 3:15, what is the angle between the hour and the minute hands? (The answer to this is not zero!)
  • Four people need to cross a rickety rope bridge to get back to their camp at night. Unfortunately, they only have one flashlight and it only has enough light left for seventeen minutes. The bridge is too dangerous to cross without a flashlight, and it’s only strong enough to support two people at any given time. Each of the campers walks at a different speed. One can cross the bridge in 1 minute, another in 2 minutes, the third in 5 minutes, and the slow poke takes 10 minutes to cross. How do the campers make it across in 17 minutes?(经典的过桥问题)
  • You are at a party with a friend and 10 people are present including you and the friend. your friend makes you a wager that for every person you find that has the same birthday as you, you get $1; for every person he finds that does not have the same birthday as you, he gets $2. would you accept the wager?
  • How many piano tuners are there in the entire world?
  • You have eight balls all of the same size. 7 of them weigh the same, and one of them weighs slightly more. How can you find the ball that is heavier by using a balance and only two weighings?(经典的称重问题。这样的问题花样很多,不过都不难回答)
  • You have five pirates, ranked from 5 to 1 in descending order. The top pirate has the right to propose how 100 gold coins should be divided among them. But the others get to vote on his plan, and if fewer than half agree with him, he gets killed. How should he allocate the gold in order to maximize his share but live to enjoy it? (Hint: One pirate ends up with 98 percent of the gold.)
  • You are given 2 eggs. You have access to a 100-story building. Eggs can be very hard or very fragile means it may break if dropped from the first floor or may not even break if dropped from 100th floor. Both eggs are identical. You need to figure out the highest floor of a 100-story building an egg can be dropped without breaking. The question is how many drops you need to make. You are allowed to break 2 eggs in the process. (从3的倍数的楼层开始扔,比如3,6,9,12…..,如果鸡蛋在3n层碎了,那到在3n-1层扔第二个鸡蛋,如果没碎,则最高不碎楼层为3n- 1,否则为3n-2)
  • Describe a technical problem you had and how you solved it.
  • How would you design a simple search engine?
  • Design an evacuation plan for San Francisco.
  • There’s a latency problem in South Africa. Diagnose it. (这个问题完全是在考你的解决问题的能力。没有明确的答案。不过,解决性能问题的第一步通常是找出瓶颈,找瓶颈有很多种方法,工具,二分查,时间记 录等等。)
  • What are three long term challenges facing Google?
  • Name three non-Google websites that you visit often and like. What do you like about the user interface and design? Choose one of the three sites and comment on what new feature or project you would work on. How would you design it?
  • If there is only one elevator in the building, how would you change the design? How about if there are only two elevators in the building? (经典的电梯设计问题,这种问题千变万化,主要是考你的设计能力和需求变化的适变能力,与此相似的是酒店订房系统。)
  • How many vacuum’s are made per year in USA?

Software Engineer
  • Why are manhole covers round? (为什么下水井盖是圆的?这是有N种答案的,上Wiki看看吧)
  • What is the difference between a mutex and a semaphore? Which one would you use to protect access to an increment operation?
  • A man pushed his car to a hotel and lost his fortune. What happened? (脑筋急转弯?他在玩大富翁游戏?!!)
  • Explain the significance of “dead beef”.(要是你看到的是16进制 DEAD BEEF,你会觉得这是什么?IPv6的地址?)
  • Write a C program which measures the the speed of a context switch on a UNIX/Linux system.
  • Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.(上StackOverflow看看吧,经典的问题)
  • Describe the algorithm for a depth-first graph traversal.
  • Design a class library for writing card games. (用一系列的类来设计一个扑克游戏,设计题)
  • You need to check that your friend, Bob, has your correct phone number, but you cannot ask him directly. You must write a the question on a card which and give it to Eve who will take the card to Bob and return the answer to you. What must you write on the card, besides the question, to ensure Bob can encode the message so that Eve cannot read your phone number?(协议+数字加密,我试想了一个,纸条上可以这样写,“Bob,请把我的手机号以MD5算法加密后的字符串,比对下面的字符串—— XXXXXX,它们是一样的吗?”)
  • How are cookies passed in the HTTP protocol?
  • Design the SQL database tables for a car rental database.
  • Write a regular expression which matches a email address. (上StackOverflow查相当的问题吧。)
  • Write a function f(a, b) which takes two character string arguments and returns a string containing only the characters found in both strings in the order of a. Write a version which is order N-squared and one which is order N.(算法题,不难,不说了。一个O(n^2)和一个O(n)的算法复杂度)
  • You are given a the source to a application which is crashing when run. After running it 10 times in a debugger, you find it never crashes in the same place. The application is single threaded, and uses only the C standard library. What programming errors could be causing this crash? How would you test each one? (和随机数有关系?或是时间?)
  • Explain how congestion control works in the TCP protocol.
  • In Java, what is the difference between final, finally, and finalize?
  • What is multithreaded programming? What is a deadlock?
  • Write a function (with helper functions if needed) called to Excel that takes an excel column value (A,B,C,D…AA,AB,AC,… AAA..) and returns a corresponding integer value (A=1,B=2,… AA=26..).
  • You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describe how you would go about finding a good estimate of 1000 samples from this never ending set of data and then write code for it.
  • Tree search algorithms. Write BFS and DFS code, explain run time and space requirements. Modify the code to handle trees with weighted edges and loops with BFS and DFS, make the code print out path to goal state.
  • You are given a list of numbers. When you reach the end of the list you will come back to the beginning of the list (a circular list). Write the most efficient algorithm to find the minimum # in this list. Find any given # in the list. The numbers in the list are always increasing but you don’t know where the circular list begins, ie: 38, 40, 55, 89, 6, 13, 20, 23, 36. (循环排序数组的二分查找问题)
  • Describe the data structure that is used to manage memory. (stack)
  • What’s the difference between local and global variables?
  • If you have 1 million integers, how would you sort them efficiently? (modify a specific sorting algorithm to solve this)
  • In Java, what is the difference between static, final, and const. (if you don’t know Java they will ask something similar for C or C++).
  • Talk about your class projects or work projects (pick something easy)… then describe how you could make them more efficient (in terms of algorithms).
  • Suppose you have an NxN matrix of positive and negative integers. Write some code that finds the sub-matrix with the maximum sum of its elements.(以前见过一维数组的这个问题,现在是二维的。感觉应该是把二维的第一行的最大和的区间算出来,然后再在这个基础之上进行二维的 分析。思路应该是这个,不过具体的算法还需要想一想)
  • Write some code to reverse a string.
  • Implement division (without using the divide operator, obviously).(想一想手算除法的过程。)
  • Write some code to find all permutations of the letters in a particular string.
  • What method would you use to look up a word in a dictionary? (使用排序,哈希,树等算法和数据结构)
  • Imagine you have a closet full of shirts. It’s very hard to find a shirt. So what can you do to organize your shirts for easy retrieval?
  • You have eight balls all of the same size. 7 of them weigh the same, and one of them weighs slightly more. How can you fine the ball that is heavier by using a balance and only two weighings?
  • What is the C-language command for opening a connection with a foreign host over the internet?
  • Design and describe a system/application that will most efficiently produce a report of the top 1 million Google search requests. These are the particulars: 1) You are given 12 servers to work with. They are all dual-processor machines with 4Gb of RAM, 4x400GB hard drives and networked together.(Basically, nothing more than high-end PC’s) 2) The log data has already been cleaned for you. It consists of 100 Billion log lines, broken down into 12 320 GB files of 40-byte search terms per line. 3) You can use only custom written applications or available free open-source software.
  • There is an array A[N] of N numbers. You have to compose an array Output[N] such that Output[i] will be equal to multiplication of all the elements of A[N] except A[i]. For example Output[0] will be multiplication of A[1] to A[N-1] and Output[1] will be multiplication of A[0] and from A[2] to A[N-1]. Solve it without division operator and in O(n).(注意其不能使用除法。算法思路是这样的,把output[i]=a[i]左边的乘积 x a[i]右边的乘积,所以,我们可以分两个循环,第一次先把A[i]左边的乘积放在Output[i]中,第二次把A[i]右边的乘积算出来。我们先看第 一次的循环,使用迭代累积的方式,代码如下:for(r=1; i=0; i<n-1; i++){ Output[i]=r; r*=a[i]; },看明白了吧。第二次的循环我就不说了,方法一样的。)
  • There is a linked list of numbers of length N. N is very large and you don’t know N. You have to write a function that will return k random numbers from the list. Numbers should be completely random. Hint: 1. Use random function rand() (returns a number between 0 and 1) and irand() (return either 0 or 1) 2. It should be done in O(n).(本题其实不难。在遍历链表的同时一边生成随机数,一边记录最大的K个随机数和其链接地址。)
  • Find or determine non existence of a number in a sorted list of N numbers where the numbers range over M, M>> N and N large enough to span multiple disks. Algorithm to beat O(log n) bonus points for constant time algorithm.(使用bitmap,如果一个长整形有64位,那么我们可以使用M/64个bitmap)
  • You are given a game of Tic Tac Toe. You have to write a function in which you pass the whole game and name of a player. The function will return whether the player has won the game or not. First you to decide which data structure you will use for the game. You need to tell the algorithm first and then need to write the code. Note: Some position may be blank in the game। So your data structure should consider this condition also.
  • You are given an array [a1 To an] and we have to construct another array [b1 To bn] where bi = a1*a2*…*an/ai. you are allowed to use only constant space and the time complexity is O(n). No divisions are allowed.(前面说过了)
  • How do you put a Binary Search Tree in an array in a efficient manner. Hint :: If the node is stored at the ith position and its children are at 2i and 2i+1(I mean level order wise)Its not the most efficient way.(按顺序遍历树)
  • How do you find out the fifth maximum element in an Binary Search Tree in efficient manner. Note: You should not use use any extra space. i.e sorting Binary Search Tree and storing the results in an array and listing out the fifth element.
  • Given a Data Structure having first n integers and next n chars. A = i1 i2 i3 … iN c1 c2 c3 … cN.Write an in-place algorithm to rearrange the elements of the array ass A = i1 c1 i2 c2 … in cn(这个算法其实就是从中间开始交换元素,代码:for(i=n-1; i>1; i++) {  for(j=i; j<2*n-i; j+=2) { swap(a[j], a[j+1]); } },不好意思写在同一行上了。)
  • Given two sequences of items, find the items whose absolute number increases or decreases the most when comparing one sequence with the other by reading the sequence only once.
  • Given That One of the strings is very very long , and the other one could be of various sizes. Windowing will result in O(N+M) solution but could it be better? May be NlogM or even better?
  • How many lines can be drawn in a 2D plane such that they are equidistant from 3 non-collinear points?
  • Let’s say you have to construct Google maps from scratch and guide a person standing on Gateway of India (Mumbai) to India Gate(Delhi). How do you do the same?
  • Given that you have one string of length N and M small strings of length L. How do you efficiently find the occurrence of each small string in the larger one?
  • Given a binary tree, programmatically you need to prove it is a binary search tree.
  • You are given a small sorted list of numbers, and a very very long sorted list of numbers – so long that it had to be put on a disk in different blocks. How would you find those short list numbers in the bigger one?
  • Suppose you have given N companies, and we want to eventually merge them into one big company. How many ways are theres to merge?
  • Given a file of 4 billion 32-bit integers, how to find one that appears at least twice? (我能想到的是拆分成若干个小数组,排序,然后一点点归并起来)
  • Write a program for displaying the ten most frequent words in a file such that your program should be efficient in all complexity measures.(你可能需要看看这篇文章Finding Frequent Items in Data Streams)
  • Design a stack. We want to push, pop, and also, retrieve the minimum element in constant time.
  • Given a set of coin denominators, find the minimum number of coins to give a certain amount of change.(你应该查看一下这篇文章:Coin Change Problem)
  • Given an array, i) find the longest continuous increasing subsequence. ii) find the longest increasing subsequence.(这个题不难,O(n)算法是边遍历边记录当前最大的连续的长度。)
  • Suppose we have N companies, and we want to eventually merge them into one big company. How many ways are there to merge?
  • Write a function to find the middle node of a single link list. (我能想到的算法是——设置两个指针p1和p2,每一次,p1走两步,p2走一步,这样,当p1走到最后时,p2就在中间)
  • Given two binary trees, write a compare function to check if they are equal or not. Being equal means that they have the same value and same structure.(这个很简单,使用递归算法。)
  • Implement put/get methods of a fixed size cache with LRU replacement algorithm.
  • You are given with three sorted arrays ( in ascending order), you are required to find a triplet ( one element from each array) such that distance is minimum. Distance is defined like this : If a[i], b[j] and c[k] are three elements then distance=max(abs(a[i]-b[j]),abs(a[i]-c[k]),abs(b[j]-c[k]))” Please give a solution in O(n) time complexity(三个指针,a, b, c分别指向三个数组头,假设:a[0]<b[0]<c[0],推进a直到a[i]>b[0],计算 abs(a[i-1] – c[0]),把结果保存在min中。现在情况变成找 a[i], b[0],c[0],重复上述过程,如果有一个新的值比min要小,那就取代现有的min。)
  • How does C++ deal with constructors and deconstructors of a class and its child class?
  • Write a function that flips the bits inside a byte (either in C++ or Java). Write an algorithm that take a list of n words, and an integer m, and retrieves the mth most frequent word in that list.
  • What’s 2 to the power of 64?
  • Given that you have one string of length N and M small strings of length L. How do you efficiently find the occurrence of each small string in the larger one? (我能想到的是——把那M个小字串排个序,然后遍历大字串,并在那M个字串中以二分取中的方式查找。)
  • How do you find out the fifth maximum element in an Binary Search Tree in efficient manner.
  • Suppose we have N companies, and we want to eventually merge them into one big company. How many ways are there to merge?
  • There is linked list of millions of node and you do not know the length of it. Write a function which will return a random number from the list.
  • You need to check that your friend, Bob, has your correct phone number, but you cannot ask him directly. You must write a the question on a card which and give it to Eve who will take the card to Bob and return the answer to you. What must you write on the card, besides the question, to ensure Bob can encode the message so that Eve cannot read your phone number?
  • How long it would take to sort 1 trillion numbers? Come up with a good estimate.
  • Order the functions in order of their asymptotic performance: 1) 2^n 2) n^100 3) n! 4) n^n
  • There are some data represented by(x,y,z). Now we want to find the Kth least data. We say (x1, y1, z1) > (x2, y2, z2) when value(x1, y1, z1) > value(x2, y2, z2) where value(x,y,z) = (2^x)*(3^y)*(5^z). Now we can not get it by calculating value(x,y,z) or through other indirect calculations as lg(value(x,y,z)). How to solve it?
  • How many degrees are there in the angle between the hour and minute hands of a clock when the time is a quarter past three?
  • Given an array whose elements are sorted, return the index of a the first occurrence of a specific integer. Do this in sub-linear time. I.e. do not just go through each element searching for that element.
  • Given two linked lists, return the intersection of the two lists: i.e. return a list containing only the elements that occur in both of the input lists. (把第一个链表存入hash表,然后遍历第二个链表。不知道还没有更好的方法。)
  • What’s the difference between a hashtable and a hashmap?
  • If a person dials a sequence of numbers on the telephone, what possible words/strings can be formed from the letters associated with those numbers?(这个问题和美国的电话有关系,大家可以试着想一下我们发短信的手机,按数字键出字母,一个组合的数学问题。)
  • How would you reverse the image on an n by n matrix where each pixel is represented by a bit?
  • Create a fast cached storage mechanism that, given a limitation on the amount of cache memory, will ensure that only the least recently used items are discarded when the cache memory is reached when inserting a new item. It supports 2 functions: String get(T t) and void put(String k, T t).
  • Create a cost model that allows Google to make purchasing decisions on to compare the cost of purchasing more RAM memory for their servers vs. buying more disk space.
  • Design an algorithm to play a game of Frogger and then code the solution. The object of the game is to direct a frog to avoid cars while crossing a busy road. You may represent a road lane via an array. Generalize the solution for an N-lane road.
  • What sort would you use if you had a large data set on disk and a small amount of ram to work with?
  • What sort would you use if you required tight max time bounds and wanted highly regular performance.
  • How would you store 1 million phone numbers?(试想电话是有区段的,可以把区段统一保存,Flyweight设计模式)
  • Design a 2D dungeon crawling game. It must allow for various items in the maze – walls, objects, and computer-controlled characters. (The focus was on the class structures, and how to optimize the experience for the user as s/he travels through the dungeon.)
  • What is the size of the C structure below on a 32-bit system? On a 64-bit? (注意编译器的对齐)

struct foo {

char a;
char* b;
};

Software Engineer in Test
  • Efficiently implement 3 stacks in a single array.
  • Given an array of integers which is circularly sorted, how do you find a given integer.
  • Write a program to find depth of binary search tree without using recursion.
  • Find the maximum rectangle (in terms of area) under a histogram in linear time.
  • Most phones now have full keyboards. Before there there three letters mapped to a number button. Describe how you would go about implementing spelling and word suggestions as people type.
  • Describe recursive mergesort and its runtime. Write an iterative version in C++/Java/Python.
  • How would you determine if someone has won a game of tic-tac-toe on a board of any size?
  • Given an array of numbers, replace each number with the product of all the numbers in the array except the number itself *without* using division.
  • Create a cache with fast look up that only stores the N most recently accessed items.
  • How to design a search engine? If each document contains a set of keywords, and is associated with a numeric attribute, how to build indices?
  • Given two files that has list of words (one per line), write a program to show the intersection.
  • What kind of data structure would you use to index annagrams of words? e.g. if there exists the word “top” in the database, the query for “pot” should list that.
Quantitative Compensation Analyst

  • What is the yearly standard deviation of a stock given the monthly standard deviation?
  • How many resumes does Google receive each year for software engineering?
  • Anywhere in the world, where would you open up a new Google office and how would you figure out compensation for all the employees at this new office?
  • What is the probability of breaking a stick into 3 pieces and forming a triangle?
Engineering Manager
  • You’re the captain of a pirate ship, and your crew gets to vote on how the gold is divided up. If fewer than half of the pirates agree with you, you die. How do you recommend apportioning the gold in such a way that you get a good share of the booty, but still survive?
AdWords Associate
  • How would you work with an advertiser who was not seeing the benefits of the AdWords relationship due to poor conversions?
  • How would you deal with an angry or frustrated advertisers on the phone?
 
 
 
Sources

Google评价blog的指标

读了一下 Google 关于 Blog Ranking 的 Patent,总结如下。
 
正面的指标:
  1. [0038] 订阅数

    统计 blog 在各种 reader 中被订阅的数量。被订阅的越多,ranking 越高。但同时会使用一些方法处理“subscriptions spam”,诸如验证订制人和 IP 的唯一性。
  2. [0039] 搜索点击数

    统计 blog 作为搜索结果时被点击的次数。点击次数越多,ranking 越高。

     
  3. [0040] 在其他 blogger 的 blogroll 里的出现次数

    blogger 通常会使用 blogroll 来整理指到其他 blogger 的链接集合。统计所有 blogroll 中,指向某个 blog 的链接越多,ranking 越高。

     
  4. [0041] 来自高质量的 blogroll 的链接数

    高质量的 blogroll 的链接大多都指向著名的或值得信任的 blog。
  5. [0042] 来自高质量的 blog 的 blogroll 的链接数

    这里的假定是著名的或值得信任的 blogger 不会放指向 spam blog 的链接。
  6. [0043] 有Tag

    blog 作者如果分析了 blog 内容,归类并打上了 tag,起码可以说明作者的态度比较认真。

     
  7. [0044] 来自邮件和聊天记录的链接数

    如果在 Email 正文里或者聊天记录里出现了指向 blog 的链接,会加分。GEmail 和 Gtalk 被用在了这里。

     
  8. [0045] PageRank

    PageRank 越高对应的 blog 也就越重要。考虑到blog的更新比较频繁,最新的 blog post 可能还没有PR。这时可以用对应的 blog 的 PR 来代替。
其中 [0040-0042],其实是类似于传统网页间 PageRank 计算的一套模式,只不过这里把它限制在了 blog 之间。
负面的指标:
  1. [0047] 更新频率异常

    更新过于频繁或者非常有规律,会被认为是在 spam,ranking 会降低。这里提醒喜欢在每天的固定时间更新 blog 的朋友注意一下了。

  2. [0048] feed 内容和 blog 内容的不一致

    spammer 有可能会为了提升自己的 ranking 而把有价值的内容放到 feed 里面,同时在 blog 内容里面放一些指向不相关内容的广告链接。为了惩罚这种情况,对于 feed 内容和 blog 内容不一致的情况,要降低 ranking。
  3. [0049] 出现重复内容

    有些 spammer 为了让某些内容能够多次长时间的出现在 feed 里面,会重复发布同样的内容。这样的情况会被惩罚。
  4. [0050] 垃圾词过多

    通过词频统计(bi-gram 或者 tri-gram 等),如果 blog 内容里垃圾词的比较过高,会降低 ranking。
  5. [0051] 多数 blog 长度相近

    这个主要是针对使用机器自动生成 blog 的情况。
  6. [0052] 链接异常

    当 blog 里的链接多为指向单一网页,或者单一的外站,会被认为是在 spam,ranking 会降低。
  7. [0053] 广告太多

    如果一个 blog 页面内含有过多的广告,会降低 ranking。
  8. [0054] 广告出现在正文里

    一般 blog 页面会包括三方面的内容:最近发表的 blog,blogroll 和 metadata。如果广告出现在正文里,会降低 ranking。不知道 adsense 的广告有没有特殊待遇?

10个不为人知的Google失败作品

  毫无疑问,Google是当今世界上最成功的互联网公司之一,但是Google也曾推出过一些失败的实验品。还记得Google Accelerator,那个号称能够加速网页浏览的免费工具吗?,Google Answer也因为它的付费模式而不得不退出了产品线。其实Google Video本可能成为这个清单中的一员,不过财大气粗的Google对Youtube的成功收购应证了这样一句话:“如果你竞争不过一个公司,那就买下它”。以下的10个Google失败作品中,有些仅仅出现了一天就消失在公众的视线之中了,幸运的是,某些有心人把他们存在的记录保留了下来。
 
  也许在Google Labs里面,还存在着更多的昙花一现的的实验品。废话不多说,一起来看看吧。
 
  1、Google X
  估计Google的程序员之中也有不少Apple的fans,Google X 就是一个模仿Mac OS X的Dock界面而推出的版本。此站点在2005年仅仅出现了一天就消失了,虽然没有对外公布理由,但不难猜测是的是Google不想因为这个模仿而招来Apple的侵权诉讼。不过,现在网络上还存在着不少模仿Google X的站点,也算是一种怀念吧。

 

  2、Google Catalog(即将撤下产品线)
  想知道最新的USB闪存盘的报价吗?如果你使用Google Catalog  来搜索的话,很抱歉,你得到的结果很可能是MicroWarehouse公司在2001年的产品目录,在那个年代,一个256MB的闪存盘售价高达595 美元。用“laptop”作为关键词搜索得到的最近的一个结果是2006年Cyberguys的春季报价。应该说,这个产品的初衷是挺好,只可惜没有在技术上得到实现,不过,把Google Catalog当作一个互联网资料存档的地方还是不错的。

 

 
3、Google Video Player
  Google Video Player的一个最主要优点应该是它的视频列表功能,它随机显示一些与正在播放的内容相关的其他视频,同时还允许用户观看视频的任意一个部分,即便该部分的内容还没有被缓冲到(事实上现在很多视频网站都提供这个功能了)。它的缺点在于它经常在播放列表中提供一些需要付费才能观看的内容,而且它不支持将视频转移到移动设备中去。Google Video Player已经于2007年8月撤下产品线。

 

  4、Google Web Accelerator
  Google Web Accelerator号称可以提升百分之二十的网页载入速度。不过遗憾的是,Google Web Accelerator将宽带用户作为目标用户群显然是错误的,因为他们打开网页的速度已经足够快了,而且这个东西还可能侵犯到你的隐私。当初有人批评 Google是用这个软件来为市场调查搜集数据,因为它可能在你上网的同时监视你访问了哪些网站,购买了哪些商品。现在Google依然在提供这个软件的下载。
 
 
 
5、Google Answer
  在Google Answer存在的5年时间里,它一直为用户提供了这样一种模式:提出问题-设定报酬-获取答案,这相当于是一种有偿咨询的平台,并且也曾风靡过一段时间,要知道,这里的报酬可不是像“百度知道”那样的虚拟积分,而是实打实的美金。不过,由于回答者常常难以将报酬兑现,加上Yahoo Answers那种类似的论坛式免费平台的存在,Google Answers最终还是落了个关闭的下场。不过,你依然可以访问Google Answers的数据库 (试试输入“ “What has happened to Answers”,看看你所得到的结果)。

 

  6、Google Coupons
  也许把Google Coupons比作白犀牛毫不为过——它确实存在过,不过见过的人少之又少。Google Coupons基本上可以算是Google本地商户中心服务(它可以让商家在Google Maps上创建自己的公司位置)的一个附属品,如果有人使用Google Maps搜索到某一个商家,同时该商家又使用了Google Coupons功能时,访问者就可以看到一张类似优惠券的图片,他可以把这张图片打印出来并在现实中使用。不过,从Google Maps建立到Google Coupons停止服务的2006年,真的有人看到过这张优惠券吗?
 

 

7、Google Voice Search
  其实早在2003年,Google Voice Search(语音搜索)就连同Google Labs一起诞生了,它最初的界面如上图所示。使用语音搜索的方法是:拨打专用的搜索热线,说出你要搜索的东西,挂掉电话,点击相应链接,得到搜索结果。毫无疑问,用这种方式来搜索互联网就好比你晚上睡觉前打电话给你的朋友让他开车过来给你刷牙。不过,这个创意已经被不少网站发扬光大了(例如百度的语音搜索),Google自己也推出了移动版的语音搜索 (此页面中有相关的演示视频)。

 

  8、Google Viewer
  作为一个单独发行的软件,Google Vewer允许你输入一串关键字,然后将搜索结果以实际网页的形式返回给用户,显示结果的过程则是以幻灯片的形式呈现给用户的。实际上,随着技术的进步,这样的想法已经没有必要用软件的形式来实现了。Ask.com、Powerset、Yahoo就是最好的例子。Google Viewer也就这样渐渐退出了历史的舞台。

 

 
9、Google Checkout
  2007 年6月,数以千计的eBay卖家来到波士顿参加一年一度的eBay商户大会,为了和eBay的PayPal在线支付系统竞争,Google计划在这次大会的eBay Show上向卖家们展示Google Checkout服务,并劝说他们不要使用eBay的支付系统。最终eBay没有让Google得逞,不过eBay也付出了相应的代价,Google在长达一个星期的时间里去掉了搜索结果页面中的eBay广告。最终,还是Google自己终止了这场互联网版的波士顿倾茶事件。

 

  10、Orkut
  相信有很多人不会对Orkut这个名字感到陌生吧,当年随便进一个稍微大点的论坛,迎面而来的就是“跪求Orkut邀请”的帖子。其实,Orkut本可以成为现在的Facebook或者MySpace的,只可惜它的邀请制度确实引来了不少人的不满,同时由于缺少对博客工具的支持以及不能上传视频着两大缺点, Orkut最终被它的竞争对手们甩在了身后。

Yahoo! PHP 笔试题

1. Which of the following will not add john to the users array?

1. $users[] = 'john';
2. array_add($users,'john');
3. array_push($users,'john');
4. $users ||= 'john';

2.没有array_add这个函数;
4.语法错误。

2. What's the difference between sort(), asort() and ksort? Under what circumstances would you use each of these?

看手册的:

sort:本函数对数组进行排序。当本函数结束时数组单元将被从最低到最高重新安排。注: 本函数为 array 中的单元赋予新的键名。这将删除原有的键名而不仅是重新排序。

asort:对数组进行排序并保持索引关系

ksort:对数组按照键名排序,保留键名到数据的关联。

3. What would the following code print to the browser? Why?

$num = 10;
function multiply(){
$num = $num * 10;
}
multiply();
echo $num;

10,一个是全局变量,一个是局部变量。而且PHP中的全局变量跟C中的不一样,在函数中使用要申请为全局变量才行。

4. What is the difference between a reference and a regular variable? How do you pass by reference & why would you want to?

5. What functions can you use to add library code to the currently running script?

6. What is the difference between foo() & @foo()?

7. How do you debug a PHP application?

8. What does === do? What's an example of something that will give true for '==', but not '==='?

9. How would you declare a class named “myclass” with no methods or properties?

10. How would you create an object, which is an instance of “myclass”?

11. How do you access and set properties of a class from within the class?

12. What is the difference between include & include_once? include & require?

13. What function would you use to redirect the browser to a new page?
1. redir()
2. header()
3. location()
4. redirect()

14. What function can you use to open a file for reading and writing?

1. fget();
2. file_open();
3. fopen();
4. open_file();

15. What's the difference between mysql_fetch_row() and mysql_fetch_array()?

16. What does the following code do? Explain what's going on there.
$date='08/26/2003';
print ereg_replace("([0-9]+)/([0-9]+)/([0-9]+)",2/1/3,$date);

17. Given a line of text $string, how would you write a regular expression to strip all the HTML tags from it?

18. What's the difference between the way PHP and Perl distinguish between arrays and hashes?

19. How can you get round the stateless nature of HTTP using PHP?

20. What does the GD library do?

21. Name a few ways to output (print) a block of HTML code in PHP?

22. Is PHP better than Perl? – Discuss.

PHP的面试题集

面试题1
1、用PHP打印出前一天的时间格式是2006-5-10 22:21:21
2、echo(),print(),print_r()的区别
3、能够使HTML和PHP分离开使用的模板
4、如何实现PHP、JSP交互?
5、使用哪些工具进行版本控制?
6、如何实现字符串翻转?
7、优化MYSQL数据库的方法。
8、谈谈事务处理
9、apache+mysql+php实现最大负载的方法
10、实现中文字串截取无乱码的方法。

面试题2

var $empty       = ";
var $null        = NULL;
var $bool        = FALSE;
var $notSet;
var $array       = array();

1.
$a = "hello";
$b = &$a;
unset($b);
$b = "world";
what is $a?

2.
$a = 1;
$x = &$a;
$b = $a++;
what is $b?
 

3.
$x = empty($array);
what is $x?   true    or    false
 
4.您是否用过版本控制软件? 如果有您用的版本控制软件的名字是?
5.您是否用过模板引擎? 如果有您用的模板引擎的名字是?
6.请简单阐述您最得意的开发之作.
7.对于大流量的网站,您采用什么样的方法来解决访问量问题?
8.用PHP写出显示客户端IP与服务器IP的代码:

 

面试题3
一、PHP/MySQL编程
1) 某内容管理系统中,表message有如下字段
id 文章id
title 文章标题
content 文章内容
category_id 文章分类id
hits 点击量
创建上表,写出MySQL语句
2)同样上述内容管理系统:表comment记录用户回复内容,字段如下
comment_id 回复id
id 文章id,关联message表中的id
comment_content 回复内容
现通过查询数据库需要得到以下格式的文章标题列表,并按照回复数量排序,回复最高的排在最前面
文章id 文章标题 点击量 回复数量
用一个SQL语句完成上述查询,如果文章没有回复则回复数量显示为0
3) 上述内容管理系统,表category保存分类信息,字段如下
category_id int(4) not null auto_increment;
categroy_name varchar(40) not null;
用户输入文章时,通过选择下拉菜单选定文章分类
写出如何实现这个下拉菜单
二、PHP文件操作
1)
上述内容管理系统:用户提交内容后,系统生成静态HTML页面;写出实现的基本思路
2) 简单描述用户修改以发布内容的实现流程和基本思路
三、PHP程序
1) 写出以下程序的输出结果
<?
$b=201;
$c=40;
$a=$b>$c?4:5;
echo $a;
?>
2) 写出以下程序的输出结果
<?
$str="cd";
$$str="hotdog";
$$str.="ok";
echo $cd;
?>

 

面试题4
一. 简答题
1. 请说明php中传值与传引用的区别。什么时候传值什么时候传引用?
2. 在PHP中error_reporting这个函数有什么作用?
3. 请写一个函数验证电子邮件的格式是否正确
4. 简述如何得到当前执行脚本路径,包括所得到参数。
说明:例如有一个脚本www.domain.com,传给他的参数有参数1,参数2,参数3….
传递参数的方法有可能是GET有可能是POST,那么现在请写出类似
http://www.domain.com/script.php? 参数1=值1&参数2=值2….. 的结果
5.如何修改SESSION的生存时间.
6..有一个网页地址 http://www.domain.com/xxx.php,如何得到它的内容?
7. 有一个一维数组,里面存储整形数据,请写一个函数,将他们按从大到小的顺序排列。要求执行效率高。并说明如何改善执行效率。(该函数必须自己实现,不能使用php函数)
8. 请举例说明在你的开发过程中用什么方法来加快页面的加载速度。
二. 数据库设计题:
请设计一套图书馆借书管理系统的数据库表结构;可以记录基本的用户信息、图书信息、借还书信息;数据表的个数不超过6个;请画表格描述表结构(需要说明每个字段的字段名、字段类型、字段含义描述);
在数据库设计中应:
1. 保证每个用户的唯一性;
2. 保证每种图书的唯一性;每种图书对应不等本数的多本图书;保证每本图书的唯一性;
3. 借书信息表中,应同时考虑借书行为与还书行为,考虑借书期限;
4. 保证借书信息表与用户表、图书信息表之间的参照完整性;
5. 限制每个用户最大可借书的本数
6. 若有新用户注册或新书入库,保证自动生成其唯一性标识
7. 为以下的一系列报表需求提供支持:
(无特定说明,不需编写实现语句,而需在数据库设计中,保证这些报表可以用最多一条SQL语句实现)
a) 日统计报表:当日借书本数、当日还书本数报表;
b) 实时报表:
i. 当前每种书的借出本数、可借本数;
ii. 当前系统中所有超期图书、用户的列表及其超期天数
iii. 当前系统中所有用户借书的本数,分用户列出(包括没有借书行为的用户);请编写实现此需求的SQL语句:
数据库应用:
请撰写一系列的SQL语句,分别描述完整的借书行为与还书行为;并保证这一系列的SQL语句的执行完整性
下题是测验能力之最重要测试,如不能完成我们将无法给出评判结果!所以请写出详细的回答,并保证答案是可以执行的程序。在两日内将结果通过电子邮件寄到hr@88keke.com邮箱
结合第二题中你的设计,用一种数据库实现,要求使用三层结构或者多层结构,要求采用面向对象的思想进行编程,有可能的话,设计一套模板机制来实现之。
功能:列出当前借出图书的情况 ,按日期排列
编号 用户姓名 书名 书的编号 借出日期
1. 张进 大染坊 12576587 2004-9-1
2. 刘兴 西游记 32131098 2004-9-2
……

 

面试题5
1.在PHP中,当前脚本的名称(不包括路径和查询字符串)记录在预定义变量(1)中;而链接到当前页面的URL记录在预定义变量(2)中。
2.执行程序段<?php echo 8%(-3) ?>将输出(3)。
3.在HTTP 1.0中,状态码 401 的含义是(4);如果返回"找不到文件"的提示,则可用 header 函数,其语句为(5)。
4.数组函数 arsort 的作用是(6);语句 error_reporting(2047)的作用是(7)。
5.PEAR中的数据库连接字符串格式是(8)。
6.写出一个正则表达式,过虑网页上的所有JS/VBS脚本(即把script标记及其内容都去掉):(9)。
7.以Apache模块的方式安装PHP,在文件http.conf中首先要用语句(10)动态装载PHP模块,然后再用语句(11)使得Apache把所有扩展名为php的文件都作为PHP脚本处理。
8.语句 include 和 require 都能把另外一个文件包含到当前文件中,它们的区别是(12);为了避免多次包含同一文件,可以用语句(13)来代替它们。
9.类的属性可以序列化后保存到 session 中,从而以后可以恢复整个类,这要用到的函数是(14)。
10.一个函数的参数不能是对变量的引用,除非在php.ini中把(15)设为on.
11.SQL  中LEFT JOIN的含义是(16)。如果 tbl_user记录了学生的姓名(name)和学号(ID),tbl_score记录了学生(有的学生考 试以后被开除了,没有其记录)的学号(ID)和考试成绩(score)以及考试科目(subject),要想打印出各个学生姓名及对应的的各科总成绩,则 可以用SQL语句(17)。
12.在PHP中,heredoc是一种特殊的字符串,它的结束标志必须(18)。
13.写一个函数,能够遍历一个文件夹下的所有文件和子文件夹。
14.简述论坛中无限分类的实现原理。
15.设计一个网页,使得打开它时弹出一个全屏的窗口,该窗口中有一个文本框和一个按钮。用户在文本框中输入信息后点击按钮就可以把窗口关闭,而输入的信息却在主网页中显示。

面试题6

有一表 menu(mainmenu,submenu,url),请用递归法写出一树形菜单,将所有的menu列出来。

 

面试题7
1- 给你三个数,写程序求出其最大值。
2- 谈谈asp,php,jsp的优缺点
3- 谈谈对mvc的认识
4- 写出发贴数最多的十个人名字的SQL,利用下表:
members(id,username,posts,pass,email)

面试题8
1-如何通过javascript判断一个窗口是否已经被屏蔽。
2-写出session的运行机制
3-有一数组 $a=array(4,3,8,9,2);请将其重新排序,按从小到大的顺序列出。
4-防止SQL注射漏洞一般用_____函数。
5-查询在线人数,并能处理异常掉线的SQL

LexisNexis(律商联讯)最新面试题

律商联讯PHP开发工程师面试题

关于律商联讯

律商联讯(LexisNexis®)是世界领先的法律、新闻和商业资讯服务解决方案提供商,其旗舰产品包括基于网络的Lexis®和Nexis®信息服务,此服务主要面向法律,风险管理,企业,政府,会计和学术领域内的专业人士。作为励德爱思唯尔集团旗下的子公司,律商联讯在全球100多个国家设有营业机构,拥有员工13,000名。
律商联讯目前在国内设有北京和上海办事机构,向用户提供广泛的产品和服务,其中包括律商联讯中文资讯网(research.lexisnexis.com.cn),律商联讯律师搜索网(findalawyer.cn),Lexis.com,Nexis.com,律商联讯学术大全、环境大全、统计大全和国会大全,进口法律书刊,法律会议和培训等。港澳台地区有近100所大学选择了律商联讯,中国大陆已有100多所大学成为律商联讯的用户。

面试注意:

1. 个人履历表是全英文的,所以你应该能看懂英文简历常见用语(注意缩写)。可以用中文填写。
2. PHP面试题全部英文的,所以你应该能够看得明白问题。

面试过程:

1. 填写个人履历
2. PHP面试题目作答
3. 考官面试(一男一女,男的是技术部门的,女的是人力资源部的)

面试题目:

注意:原题是英文的,本人根据自己的意思来理解的。

1. 下列哪一句不能将'john'添加到数组$user?

A. $user[] = 'john';

B. array_add($user, 'john');

C. array_push($user, 'john');

D. $user ||= 'john';

2. 比较sort(), assort(), ksort()三个函数的用法以及使用环境。

3. 以下代码输出什么内容, 为何。

$num = 10;
function foo(){
$num = $num * 10;
}
foo();
echo $num;

4. 引用和普通变量的区别。

5. 加载类库的方法。

6. foo()和@foo()的区别。

7. 你平时如何调试PHP代码。

8. === 的作用是什么?举一个例子,使用 == 返回true,但是使用 === 返回false。

9. 声明一个不含方法和属性的类。

10. 创建一个对象myclass的实例。

11. 从类的内部访问或者设置属性。

12. include与include_once的区别和联系,以及include与require的区别和联系。

13. 使用什么函数重定向浏览器请求。

14. 打开一个文件供读写。

15. mysql_fetch_row和mysql_fetch_array的区别。

16. 一个ereg_replace的理解。里面涉及正则捕捉和引用。

17. 写一个正则表达式,过滤所有的HTML代码。

18. 比较PHP与Perl的array和hash

19. How to get around stateless nature of HTTP using PHP ? (我理解错误)

20. GD是用来做什么的。

21. 用PHP写出几个方法用来输出HTML代码块。

22. Is PHP better than Perl? Discuss.

考官现场提问:

1. 工作经历,工作职责,上一家公司的情况(女考官)。

2. 使用PHP多长时间?精通哪一块?

3. 使用MySQL多长时间?精通哪一块?

4. 比较MySQL的两种常用引擎的区别(MyIsAm和InnoDB)。

5. 从各种角度优化以下SQL语句:SELECT * FROM tablename WHERE id IN (13, 15, 18, 19) and age=21 ORDER BY address DESC

6. 是否了解XML?讲一下格式规范。

7. 是否使用PHP解析过XML文件?使用的什么方法?该方法使用的是什么API?(没有听懂后面半句是什么意思)

8. 写一个正则表达式

9. 一个3k+1数组,k为非负整数,请问从1乘到7000,结果的末尾会包含多少个0?

网友的回答:


1.

以下哪一句不會把 John 新增到 users 陣列?
$users[] = 'john';
成功把 John 新增到陣列 users。
array_add($users,’john’);
函式 array_add() 無定義。
array_push($users,‘john’);
成功把 John 新增到陣列 users。
$users ||= 'john';
語法錯誤。
sort()、assort()、和 ksort() 有什麼分別?它們分別在什麼情況下使用?
sort()
根據陣列中元素的值,以英文字母順序排序,索引鍵會由 0 到 n-1 重新編號。主要是當陣列索引鍵的值無關疼癢時用來把陣列排序。
assort()
PHP 沒有 assort() 函式,所以可能是 asort() 的筆誤。
asort()
與 sort() 一樣把陣列的元素按英文字母順序來排列,不同的是所有索引鍵都獲得保留,特別適合替聯想陣列排序。
ksort()
根據陣列中索引鍵的值,以英文字母順序排序,特別適合用於希望把索引鍵排序的聯想陣列。
以下的代碼會產生什麼?為什麼?
$num = 10;
function multiply(){
$num = $num * 10;
}
multiply();
echo $num;
由於函式 multiply() 沒有指定 $num 為全域變量(例如 global $num 或者 $_GLOBALS['num']),所以 $num 的值是 10。
一個 reference 跟一個正規的變量有什麼分別?如何 pass by reference?在什麼情況下我們需要這樣做?
Reference 傳送的是變量的地址而非它的值,所以在函式中改變一個變量的值時,整個應用都見到這個變量的新值。
一個正規變量傳送給函式的是它的值,當函式改變這個變量的值時,只有這個函式才見到新值,應用的其他部分仍然見到舊值。
$myVariable = "its' value";
Myfunction(&$myVariable); // 以 reference 傳送參數
以 reference 傳送參數給函式,可以使函式改變了的變量,即使在函式結束後仍然保留新值。
哪一些函式可以用來在現正執行的腳本中插入函式庫?
對這道題目不同的理解會有不同的答案,我的第一個想法是插入 PHP 函式庫不外乎 include()、include_once()、require()、require_once(),但細心再想,「函式庫」也應該包括 com 物件和 .net 函式庫,所以我們的答案也要分別包括 com_load 和 dotnet_load,下次有人提起「函式庫」的時候,別忘記這兩個函式
foo() 與 @foo() 有什麼分別?
foo() 會執行這個函式,任何解譯錯誤、語法錯誤、執行錯誤都會在頁面上顯示出來。
@foo() 在執行這個函式時,會隱藏所有上述的錯誤訊息。
很多應用程式都使用 @mysql_connect() 和 @mysql_query 來隱藏 mysql 的錯誤訊息,我認為這是很嚴重的失誤,因為錯誤不該被隱藏,你必須妥善處理它們,可能的話解決它們。
你如何替 PHP 的應用程式偵錯?
我並不常這樣做,我曾經試過很多不同的偵錯工具,在 Linux 系統中設定這些工具一點也不容易。不過以下我會介紹一個近來頗受注目的偵錯工具。
PHP - Advanced PHP Debugger 或稱 PHP - APD,第一步是執行以下的指令安裝:
pear install apd
安裝後在你的腳本的開頭位置加入以下的語句開始進行偵錯:
apd_set_pprof_trace();
執行完畢,打開以下檔案來查閱執行日誌:
apd.dumpdir
你也可以使用 pprofp 來格式化日誌。
詳細的資料可以參閱 http://us.php.net/manual/en/ref.apd.php。
「===」是什麼?試舉一個「==」是真但「===」是假的例子。
「===」是給既可以送回布爾值「假」,也可以送回一個不是布爾值但卻可以賦與「假」值的函式,strpos() 和 strrpos() 便是其中兩個例子。
問題的第二部份有點困難,想一個「==」是假,但是「===」是真的例子卻很容易,相反的例子卻很少。但我終於找到以下的例子:
if (strpos("abc", "a") == true)
{
// 這部分永不會被執行,因為 "a" 的位置是 0,換算成布爾值「假」
}
if (strpos("abc", "a") === true)
{
// 這部份會被執行,因為「===」保證函式 strpos() 的送回值不會換算成布爾值.
}
你會如何定義一個沒有成員函式或特性的類別 myclass?
class myclass
{
}
你如何產生一個 myclass 的物件?
$obj = new myclass();
沒有比這個更簡單的了。

2.

1. 下列哪一句不能将'john'添加到数组$user?

A. $user[] = 'john';

B. array_add($user, 'john');

C. array_push($user, 'john');

D. $user ||= 'john'; //这句

2. 比较sort(), assort(), ksort()三个函数的用法以及使用环境。
//排序 没怎么了解.
3. 以下代码输出什么内容, 为何。

$num = 10;
function foo(){
$num = $num * 10;
}
foo();
echo $num;
//10

4. 引用和普通变量的区别。
//内存地址记录
5. 加载类库的方法。
//__autoload()
6. foo()和@foo()的区别。
// 屏蔽错误
7. 你平时如何调试PHP代码。
//用浏览器
8. === 的作用是什么?举一个例子,使用 == 返回true,但是使用 === 返回false。
//量等. $a = '1' $a ===1 false $a == 1 true
9. 声明一个不含方法和属性的类。
class cls{}
10. 创建一个对象myclass的实例。
new cls()
11. 从类的内部访问或者设置属性。
class cls{
function cls(){
$this->name = 'abc';
}
}
12. include与include_once的区别和联系,以及include与require的区别和联系。
//循环中仅引入一次或者多次. 错误报级别
13. 使用什么函数重定向浏览器请求。

14. 打开一个文件供读写。
//fopen
15. mysql_fetch_row和mysql_fetch_array的区别。
//数字键名, 与关连键名
16. 一个ereg_replace的理解。里面涉及正则捕捉和引用。
//正则替换.
17. 写一个正则表达式,过滤所有的HTML代码。
//正则不爽.
18. 比较PHP与Perl的array和hash
//不懂.
19. How to get around stateless nature of HTTP using PHP ? (我理解错误)
//http协议?
20. GD是用来做什么的。
//处理图片的.
21. 用PHP写出几个方法用来输出HTML代码块。
//htmlentities()
22. Is PHP better than Perl? Discuss.

考官现场提问:

1. 工作经历,工作职责,上一家公司的情况(女考官)。
上一家公司在我的代码领导下,已经被奥巴马定为指定代码生产商了.
2. 使用PHP多长时间?精通哪一块?
一年, php基础
3. 使用MySQL多长时间?精通哪一块?
一年,查询
4. 比较MySQL的两种常用引擎的区别(MyIsAm和InnoDB)。
事务处理
5. 从各种角度优化以下SQL语句:SELECT * FROM tablename WHERE id IN (13, 15, 18, 19) and age=21 ORDER BY address DESC
//慢慢讲.
6. 是否了解XML?讲一下格式规范。
不了解
7. 是否使用PHP解析过XML文件?使用的什么方法?该方法使用的是什么API?(没有听懂后面半句是什么意思)
xml类
8. 写一个正则表达式
eregi('[0-9]+',$str)
9. 一个3k+1数组,k为非负整数,请问从1乘到7000,结果的末尾会包含多少个0?

PHPME_CMS开发笔记

1.2009-02-15

left join连接查询的高级使用

SELECT a.*,b.surports,b.againsts,count(c.id) FROM phpme_news as a left join phpme_digg as b on(b.content_id=a.id and b.cat_id=a.cat_id) left join phpme_comment as c on(a.id=c.content_id and a.cat_id=c.cat_id) where a.cat_id in(45) group by id order by id desc limit 0,10
 
 
2.2009-02-19
union all连接查询的高级应用
select z.* from (SELECT a.*,b.cat_name FROM phpme_news as a left join phpme_category as b on(a.cat_id=b.cat_id) where tags like '% 测试标签 %'
union all
SELECT a.*,b.cat_name FROM phpme_article as a left join phpme_category as b on(a.cat_id=b.cat_id) where tags like '% 测试标签 %'
union all
SELECT a.*,b.cat_name FROM phpme_download as a left join phpme_category as b on(a.cat_id=b.cat_id) where tags like '% 测试标签 %') as z order by z.time desc limit 0,10
 
 
3.2009-02-21
获取搜索的结果数
select sum(total) from (select count(*) as total from phpme_news as a where title like '%php atr%' union select count(*) as total from phpme_news as a where title like '%php%' union select count(*) as total from phpme_news as a where title like '%atr%' ) as z
 
获取搜索的结果
select z.* from(SELECT a.*,b.surports,b.againsts,count(c.id),d.cat_name FROM phpme_news as a left join phpme_digg as b on (b.content_id=a.id and b.cat_id=a.cat_id) left join phpme_comment as c on(a.id=c.content_id and a.cat_id=c.cat_id) left join phpme_category as d on(d.cat_id=a.cat_id) where a.deleted!=1 and a.title like '%巴西 总统%' group by a.id union SELECT a.*,b.surports,b.againsts,count(c.id),d.cat_name FROM phpme_news as a left join phpme_digg as b on (b.content_id=a.id and b.cat_id=a.cat_id) left join phpme_comment as c on(a.id=c.content_id and a.cat_id=c.cat_id) left join phpme_category as d on(d.cat_id=a.cat_id) where a.deleted!=1 and a.title like '%巴西%' group by id union SELECT a.*,b.surports,b.againsts,count(c.id),d.cat_name FROM phpme_news as a left join phpme_digg as b on (b.content_id=a.id and b.cat_id=a.cat_id) left join phpme_comment as c on(a.id=c.content_id and a.cat_id=c.cat_id) left join phpme_category as d on(d.cat_id=a.cat_id) where a.deleted!=1 and a.title like '%总统%' group by id ) as z order by z.time desc limit 0,20
 
关于count聚集函数的别名
SELECT a.*,b.surports,b.againsts,count(c.id) as comment,d.cat_name FROM phpme_news as a left join phpme_digg as b on (b.content_id=a.id and b.cat_id=a.cat_id) left join phpme_comment as c on(a.id=c.content_id and a.cat_id=c.cat_id) left join phpme_category as d on(d.cat_id=a.cat_id) where a.deleted!=1 group by id order by id desc limit 0,20
 
 
4.2009-02-22
关于多表不同列的SELECT联合查询问题。
由于表的列不同,是不能够使用union all联合查询的,它会提示错误,
我是这么解决的
 
select z.* from (
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) FROM phpme_news as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) where tags like '% 测试标签 %' group by id
 union all
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) FROM phpme_article as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) where tags like '% 测试标签 %' group by id
 union all
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) FROM phpme_download as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) where tags like '% 测试标签 %' group by id
 ) as z order by z.time desc limit 0,10
 
即不能对一张表完全查询,而只对所关心的列查出来,这样三张表查出来的列都是一样的。
不要用a.*。
 
 
DIGG获取相关内容
select z.* from (
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) as comment FROM phpme_news as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) group by id
 union all
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) as comment FROM phpme_article as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) group by id
 union all
 SELECT a.id,a.title,a.time,a.clicked,a.author,a.cat_id,a.thumb,a.content,a.tags,a.content_url,b.cat_name,c.surports,c.againsts,count(d.id) as comment FROM phpme_download as a left join phpme_category as b on(a.cat_id=b.cat_id) left join phpme_digg as c on (c.content_id=a.id and c.cat_id=a.cat_id) left join phpme_comment as d on(a.id=d.content_id and a.cat_id=d.cat_id) group by id
 ) as z ,phpme_digg as y where z.id=y.content_id and z.cat_id=y.cat_id order by z.surports desc,z.againsts asc,z.id desc limit 0,10

5.2009-03-16
看来那位SYU说的对,我居然还在用用LIKE进行站内搜索,我现在才发现,这并不是什么效率、百万级、千万级或者优化的问题,LIKE根本不适合搜索引擎,虽然MYSQL的全文检索对中文支持不好,可以我想总会有办法变通的,就像PHPCMS、DEDECMS、DISCUZ!等等一系列强大的PHP程序一样,我想我也能做到。
对了,DISCUZ!,目前跟我一样用的是LIKE,所以它的搜索效率不高,囧TL...
目前可能会花比较多的时间对PHPME_CMS进行测试,虽然以后求职或者个人的需要,但是PHPME_BLOG暂时不忙开发,
慢慢来,再过一些时间,慢慢来...

mysql全文检索对中文支持的不好
我测试过
的确如此
搜索字符串必须为4个字及以上
比如‘中亚地区’
如果是‘中国’则没有结果
不过检索速度绝对快一个数量级以上
可以用大数据量的表试试

为什么要设置 Mysql 的 ft_min_word_len=1 ?
从 Mysql 4.0 开始就支持全文索引功能,但是 Mysql 默认的最小索引长度是 4。如果是英文默认值是比较合理的,但是中文绝大部分词都是2个字符,这就导致小于4个字的词都不能被索引,全文索引功能就形同虚设了。国内的空间商大部分可能并没有注意到这个问题,没有修改 Mysql 的默认设置。
为什么要用全文索引呢?
一般的数据库搜索都是用的SQL的 like 语句,like 语句是不能利用索引的,每次查询都是从第一条遍历至最后一条,查询效率极其低下。一般数据超过10万或者在线人数过多,like查询都会导致数据库崩溃。这也就是为什么很多程序都只提供标题搜索的原因了,因为如果搜索内容,那就更慢了,几万数据就跑不动了。
Mysql 全文索引是专门为了解决模糊查询提供的,可以对整篇文章预先按照词进行索引,搜索效率高,能够支持百万级的数据检索。
如果您使用的是自己的服务器,请马上进行设置,不要浪费了这个功能。
如果您使用的是虚拟主机,请马上联系空间商修改配置。首先,Mysql 的这个默认值对于中文来说就是一个错误的设置,修改设置等于纠正了错误。其次,这个配置修改很简单,也就是几分钟的事情,而且搜索效率提高也降低了空间商数据库宕掉的几率。如果你把本篇文章发给空间商,我相信绝大部分都会愿意改的。

设置方法:
请联系服务器管理员修改 my.ini (Linux 下是 my.cnf ) ,在 [mysqld] 后面加入一行“ft_min_word_len=1”,然后重启Mysql,再登录网站后台(模块管理->全站搜索)重建全文索引,否则将无法使用全站搜索功能。

6.2009-03-17

PHPME_CMS搜索引擎完成初步开发,这是PHPME_CMS特有的搜索引擎,结合了PHPCMS,DEDECMS的设计思路,
鉴于MYSQL对中文的支持不太好,不管什么中文编码,少了4个字就不能全文检索,
根据它的一些原因,
首先将中文内容分词,分词是关键,可以自定义分词字典,根据字典中的排序量进行分词,
我这里建了个收集用户搜索关键字的表,用于收集关键字然后在后台分词,当然,为了排除一些恶意的用户,
搜索量必须大于自定义的一定数才会被定义为分词,
分词的一些关键:
中国人
一般的分词:中国 国人
好一点的分词: 中国 人
我的分词方法:中国人(“中国人”的排序量高于“中国”)

将一个文章标题与内容结合,相同的分词去掉,然后将这些分词转换为ASCII码。
为什么不转换成区位码呢?因为我比较喜欢ASCII。
这样“PHP”三个字节的数也能搜出来
每个UTF-8编码的汉字会有6个数的ASCII码,中间用空格分割,满足MYSQL全文检索的要求,
搜索起来很快,就算是千万级的搜索也不会上秒!!!!