關於String裡indexOf()的一些思考 – JAVA編程語言程序開發技術文章

這些天偶然翻起瞭算法書,看到KMP算法的時候,突然發現多年來似乎一直沒有完全搞明白,於是惡補瞭一下。寫瞭個程序,“基本”明白瞭fail函數和KMP的那些事~~~~
具體程序見下  1package test;
 2/** *//**
 3 * @author Jia Yu
 4 * @date   2010-9-28
 5 */
 6public class StringMatch {
 7
 8    private int []f;
 9    /** *//**
10     * KMP fail function to calculate f[]
11     * f[j] = k < j where k is the maximum of pat[0..k] == pat[j-k..j]
12     *         = -1     otherwise
13     * @param pat
14     */
15    public void fail(String pat){
16        int lenP = pat.length();
17        f = new int[lenP];
18        f[0] = -1;
19        for(int j=1;j<lenP;j++){
20            int i = f[j-1];
21            while((pat.charAt(j)==pat.charAt(i+1))&&(i>=0)) i = f[i];
22            if(pat.charAt(j)!=pat.charAt(i+1)) f[j] = i + 1;
23            else f[j] = -1;
24        }
25    }
26   
27    /** *//**
28     * Implementation of KMP algorithm.
29     * @param s string which is source string
30     * @param pat string pattern
31     * @return
32     */
33    public int kmp_find(String s,String pat){
34        int lenS,lenP;
35        lenS = s.length();
36        lenP = pat.length();
37        int i,j;
38        i=j=0;
39        while(i<lenS&&j<lenP){
40            if(s.charAt(i)==pat.charAt(j)){
41                i++;
42                j++;
43            }
44            else{
45                if(j==0) i++;
46                else j=f[j-1] +1;
47            }
48        }
49        if(j<lenP||lenP==0) return -1;
50        else return i-lenP;
51    }
52   
53    /** *//**
54     * @param args
55     */
56    public static void main(String[] args) {
57        // TODO Auto-generated method stub
58        StringMatch ss = new StringMatch();
59        String s = “abcdedabc”;
60        String pat = “dab”;
61        ss.fail(pat);
62        System.out.println(ss.kmp_find(s, pat));
63    }
64
65}
66


完事後忍不住想和String的indexOf比比性能。一直以為jdk 的src裡String的indexOf是用Naïve string search algorithm實現的。沒有任何技巧,當然也有好多人去拿這個說事,但是逛過一些論壇,記得人們隻是自己去實現KMP,然後說KMP有多好,但是很少有拿出數據來比對的。所以今天,在投簡歷閑暇,還是抽出些時間,看瞭看String match相關的一些東西。寫瞭一些代碼,發現瞭一些東東~~~~
不多扯閑話瞭。首先進行瞭一個KMP和String indexOf的比較,看看結果吧。
preprocess using 7549ns
=====================================================
Cycles  :      30000
String find pat in pos -1
Used Time is 87134ns
KMP find pat in pos -1
Used Time is 1071829ns
=====================================================
Cycles  :      90000
String find pat in pos -1
Used Time is 150480ns
KMP find pat in pos -1
Used Time is 2277475ns
=====================================================
Cycles  :     270000
String find pat in pos -1
Used Time is 380815ns
KMP find pat in pos -1
Used Time is 848257ns
=====================================================
Cycles  :     810000
String find pat in pos 119457
Used Time is 509997ns
KMP find pat in pos 119457
Used Time is 1141992ns
=====================================================
Cycles  :    2430000
String find pat in pos 459895
Used Time is 1845130ns
KMP find pat in pos 459895
Used Time is 4180643ns
測試代碼如下:
 1/** *//**
 2 *
 3 */
 4package test;
 5
 6import java.util.Random;
 7
 8/** *//**
 9 * @author Jia Yu
10 * @date 2010-9-28
11 */
12public class StringMatchTest2 {
13
14    private static String src;
15    private static String pat;
16    private static long cycles = 10000L;
17    private static Random rand = new Random(47);
18
19    public static void generateSource() {
20        StringBuilder sb = new StringBuilder();
21        int iter = (int) cycles;
22        for (int i = 0; i < iter; i++) {
23            sb.append((char) (a + (rand.nextInt(6))));
24        }
25        src = sb.toString();
26    }
27
28    public static void generatePattern() {
29        StringBuilder sb = new StringBuilder();
30        for (int i = 0; i < 7; i++) {
31            sb.append((char) (a + (rand.nextInt(6))));
32        }
33        pat = sb.toString();
34    }
35
36    /** *//**
37     * @param args
38     */
39    public static void main(String[] args) {
40        // TODO Auto-generated method stub
41        generatePattern();
42        StringMatch sm = new StringMatch();
43        long start, pre, dur;
44        start = System.nanoTime();
45        sm.fail(pat);
46        pre = System.nanoTime() – start;
47        System.out.println(”

發佈留言