Hướng dẫn php find files

Trong bài viết lần này mình sẽ giới thiệu các bạn đoạn code sử dụng để lấy danh sách các file trong 1 thư mục bằng PHP.

Sử dụng hàm glob trong PHP để lấy danh sách các file trong 1 thư mục



The above example will output something similar to:

funclist.txt size 44686
funcsummary.txt size 267625
quickref.txt size 137820

Notes

Note: This function will not work on remote files as the file to be examined must be accessible via the server's filesystem.

Note: This function isn't available on some systems [e.g. old Sun OS].

See Also

  • opendir[] - Open directory handle
  • readdir[] - Read entry from directory handle
  • closedir[] - Close directory handle
  • fnmatch[] - Match filename against a pattern

crayonviolent at phpfreaks dot com

13 years ago

Since I feel this is rather vague and non-helpful, I thought I'd make a post detailing the mechanics of the glob regex.

glob uses two special symbols that act like sort of a blend between a meta-character and a quantifier.  These two characters are the * and ?

The ? matches 1 of any character except a /
The * matches 0 or more of any character except a /

If it helps, think of the * as the pcre equivalent of .* and ? as the pcre equivalent of the dot [.]

Note: * and ? function independently from the previous character. For instance, if you do glob["a*.php"] on the following list of files, all of the files starting with an 'a' will be returned, but * itself would match:

a.php // * matches nothing
aa.php // * matches the second 'a'
ab.php // * matches 'b'
abc.php // * matches 'bc'
b.php // * matches nothing, because the starting 'a' fails
bc.php // * matches nothing, because the starting 'a' fails
bcd.php // * matches nothing, because the starting 'a' fails

It does not match just a.php and aa.php as a 'normal' regex would, because it matches 0 or more of any character, not the character/class/group before it.

Executing glob["a?.php"] on the same list of files will only return aa.php and ab.php because as mentioned, the ? is the equivalent of pcre's dot, and is NOT the same as pcre's ?, which would match 0 or 1 of the previous character.

glob's regex also supports character classes and negative character classes, using the syntax [] and [^]. It will match any one character inside [] or match any one character that is not in [^].

With the same list above, executing

glob["[ab]*.php] will return [all of them]:
a.php  // [ab] matches 'a', * matches nothing
aa.php // [ab] matches 'a', * matches 2nd 'a'
ab.php // [ab] matches 'a', * matches 'b'
abc.php // [ab] matches 'a', * matches 'bc'
b.php // [ab] matches 'b', * matches nothing
bc.php // [ab] matches 'b', * matches 'c'
bcd.php // [ab] matches 'b', * matches 'cd'

glob["[ab].php"] will return a.php and b.php

glob["[^a]*.php"] will return:
b.php // [^a] matches 'b', * matches nothing
bc.php // [^a] matches 'b', * matches 'c'
bcd.php // [^a] matches 'b', * matches 'cd'

glob["[^ab]*.php"] will return nothing because the character class will fail to match on the first character.

You can also use ranges of characters inside the character class by having a starting and ending character with a hyphen in between.  For example, [a-z] will match any letter between a and z, [0-9] will match any [one] number, etc..

glob also supports limited alternation with {n1, n2, etc..}.  You have to specify GLOB_BRACE as the 2nd argument for glob in order for it to work.  So for example, if you executed glob["{a,b,c}.php", GLOB_BRACE] on the following list of files:

a.php
b.php
c.php

all 3 of them would return.  Note: using alternation with single characters like that is the same thing as just doing glob["[abc].php"].  A more interesting example would be glob["te{xt,nse}.php", GLOB_BRACE] on:

tent.php
text.php
test.php
tense.php

text.php and tense.php would be returned from that glob.

glob's regex does not offer any kind of quantification of a specified character or character class or alternation.  For instance, if you have the following files:

a.php
aa.php
aaa.php
ab.php
abc.php
b.php
bc.php

with pcre regex you can do ~^a+\.php$~ to return

a.php
aa.php
aaa.php

This is not possible with glob.  If you are trying to do something like this, you can first narrow it down with glob, and then get exact matches with a full flavored regex engine.  For example, if you wanted all of the php files in the previous list that only have one or more 'a' in it, you can do this:



glob also does not support lookbehinds, lookaheads, atomic groupings, capturing, or any of the 'higher level' regex functions.

glob does not support 'shortkey' meta-characters like \w or \d.

uramihsayibok, gmail, com

13 years ago

Those of you with PHP 5 don't have to come up with these wild functions to scan a directory recursively: the SPL can do it.



Not to mention the fact that $file will be an SplFileInfo class, so you can do powerful stuff really easily:



\Luna\luna.msstyles: 4190352 B; modified 2008-04-13
\Luna\Shell\Homestead\shellstyle.dll: 362496 B; modified 2006-02-28
\Luna\Shell\Metallic\shellstyle.dll: 362496 B; modified 2006-02-28
\Luna\Shell\NormalColor\shellstyle.dll: 361472 B; modified 2006-02-28
\Luna.theme: 1222 B; modified 2006-02-28
\Windows Classic.theme: 3025 B; modified 2006-02-28

Total file size: 5281063 bytes

Sam Bryan

10 years ago

glob is case sensitive, even on Windows systems.

It does support character classes though, so a case insensitive version of


could be written as

redcube at gmx dot de

16 years ago

Please note that glob['*'] ignores all 'hidden' files by default. This means it does not return files that start with a dot [e.g. ".file"].
If you want to match those files too, you can use "{,.}*" as the pattern with the GLOB_BRACE flag.



Note: This also returns the directory special entries . and ..

eric at muyser dot com

14 years ago

As a follow up to recursively determining all paths [by viajy at yoyo dot org] and opendir being faster than glob [by Sam Yong - hellclanner at live [dot] com].

The list all dirs code didn't seem to work, at least on my server [provided by parazuce [at] gmail [dot] com].

I needed a function to create an unlimited multidimensional array, with the names of the folders/files intact [no realpath's, although that is easily possible]. This is so I can simply loop through the array, create an expandable link on the folder name, with all the files inside it.

This is the correct way to recurse I believe [no static, return small arrays to build up the multidimensional array], and includes a check for files/folders beginning with dots.

// may need modifications

function list_files[$path]
{
    $files = array[];

        if[is_dir[$path]]
    {
        if[$handle = opendir[$path]]
        {
            while[[$name = readdir[$handle]] !== false]
            {
                if[!preg_match["#^\.#", $name]]
                if[is_dir[$path . "/" . $name]]
                {
                    $files[$name] = list_files[$path . "/" . $name];
                }
                else
                {
                    $files[] = $name;
                }
            }

                        closedir[$handle];
        }
    }

    return $files;
}

print_r[list_files["/path/to/folder"]];

// example usage

function list_html[$list]
{
    $html = "";

        foreach[$list as $folder => $file]
    {
        if[is_array[$list[$folder]]]
        {
            $html .= "> [folder] " . $folder . "
";
            $html .= list_html[$list[$folder]];
        }
        else
        {
            $html .= " [file] " . $file . "
";
        }
    }

        return $html;
}

echo list_html[list_files["/path/to/folder"]];

Ultimater at gmail dot com

11 years ago

glob[] isn't limited to one directory:



Just be careful when using GLOB_BRACE regarding spaces around the comma:
{includes/*.php,core/*.php} works as expected, but
{includes/*.php, core/*.php} with a leading space, will only match the former as expected but not the latter
unless you have a directory named " core" on your machine with a leading space.
PHP can create such directories quite easily like so:
mkdir[" core"];

ni dot pineau at gmail dot com

9 years ago

Note that in case you are using braces with glob you might retrieve duplicated entries for files that matche more than one item :



Result :
Array
[
    [0] => /path/file_foo.dat
    [1] => /path/file_foobar.dat
    [2] => /path/file_foobar.dat
]

Anonymous

1 year ago

Include dotfiles excluding . and .. special dirs with .[!.]*

david dot schueler at tel-billig dot de

11 years ago

Don't use glob[] if you try to list files in a directory where very much files are stored [>100.000]. You get an "Allowed memory size of XYZ bytes exhausted ..." error.
You may try to increase the memory_limit variable in php.ini. Mine has 128MB set and the script will still reach this limit while glob[]ing over 500.000 files.

The more stable way is to use readdir[] on very large numbers of files:

r dot hartung at roberthartung dot de

13 years ago

You can use multiple asterisks with the glob[] - function.

Example:



$paths will contains paths as following examples:

- my/1/dir/xyz.php
- my/bar/dir/bar.php
- my/bar/dir/foo.php

heavyraptor at gmail dot com

14 years ago

glob[] [array_sum[] and array_map[] in fact too] can be very useful if you want to calculate the sum of all the files' sizes located in a directory:



Unfortunately there's no way to do this recursively, using glob[] [as far as I know].

nataxia at gmail dot com

15 years ago

Something I used to sort dir & subdir into array [multidimensional] reflecting dir structure.

    function getRecursiveFolderList[$curDir,$currentA=false]
      {                   
        $dirs = glob[$curDir . '/*', GLOB_ONLYDIR];

                $cur = 0;
        foreach[$dirs as $dir]
          {
            $currentA[$cur]['path'] = $dir;
            $currentA[$cur] = $this->getRecursiveFolderList[$dir,$currentA[$cur]];

                            ++$cur;
          }

        return $currentA;
      }

nuntius

13 years ago

First off, it's nice to see all of the different takes on this. Thanks for all of the great examples.

Fascinated by the foreach usage I was curious how it might work with a for loop. I found that glob was well suited for this, especially compared to opendir.  The for loop is always efficient when you want to protect against a potential endless loop.

$dir=$_SERVER['DOCUMENT_ROOT']."/test/directory_listing/test";
    echo $dir;
    $filesArray=glob[$dir."/*.*"];

        $line.="

";
    $line.=print_r[$filesArray, true];
    $line.="
";
    $line.="";

        for[$i=0;$i

Chủ Đề