diff options
Diffstat (limited to 'contrib/file/file.man')
-rw-r--r-- | contrib/file/file.man | 549 |
1 files changed, 549 insertions, 0 deletions
diff --git a/contrib/file/file.man b/contrib/file/file.man new file mode 100644 index 000000000000..1215e69e1645 --- /dev/null +++ b/contrib/file/file.man @@ -0,0 +1,549 @@ +.\" $File: file.man,v 1.79 2008/11/06 22:49:08 rrt Exp $ +.Dd October 9, 2008 +.Dt FILE __CSECTION__ +.Os +.Sh NAME +.Nm file +.Nd determine file type +.Sh SYNOPSIS +.Nm +.Op Fl bchikLnNprsvz +.Op Fl -mime-type +.Op Fl -mime-encoding +.Op Fl f Ar namefile +.Op Fl F Ar separator +.Op Fl m Ar magicfiles +.Ar file +.Nm +.Fl C +.Op Fl m Ar magicfile +.Nm +.Op Fl -help +.Sh DESCRIPTION +This manual page documents version __VERSION__ of the +.Nm +command. +.Pp +.Nm +tests each argument in an attempt to classify it. +There are three sets of tests, performed in this order: +filesystem tests, magic tests, and language tests. +The +.Em first +test that succeeds causes the file type to be printed. +.Pp +The type printed will usually contain one of the words +.Em text +(the file contains only +printing characters and a few common control +characters and is probably safe to read on an +.Dv ASCII +terminal), +.Em executable +(the file contains the result of compiling a program +in a form understandable to some +.Dv UNIX +kernel or another), +or +.Em data +meaning anything else (data is usually +.Sq binary +or non-printable). +Exceptions are well-known file formats (core files, tar archives) +that are known to contain binary data. +When modifying magic files or the program itself, make sure to +.Em "preserve these keywords" . +Users depend on knowing that all the readable files in a directory +have the word +.Sq text +printed. +Don't do as Berkeley did and change +.Sq shell commands text +to +.Sq shell script . +.Pp +The filesystem tests are based on examining the return from a +.Xr stat 2 +system call. +The program checks to see if the file is empty, +or if it's some sort of special file. +Any known file types appropriate to the system you are running on +(sockets, symbolic links, or named pipes (FIFOs) on those systems that +implement them) +are intuited if they are defined in +the system header file +.In sys/stat.h . +.Pp +The magic tests are used to check for files with data in +particular fixed formats. +The canonical example of this is a binary executable (compiled program) +.Dv a.out +file, whose format is defined in +.In elf.h , +.In a.out.h +and possibly +.In exec.h +in the standard include directory. +These files have a +.Sq "magic number" +stored in a particular place +near the beginning of the file that tells the +.Dv UNIX operating system +that the file is a binary executable, and which of several types thereof. +The concept of a +.Sq "magic" +has been applied by extension to data files. +Any file with some invariant identifier at a small fixed +offset into the file can usually be described in this way. +The information identifying these files is read from the compiled +magic file +.Pa __MAGIC__.mgc , +or the files in the directory +.Pa __MAGIC__ +if the compiled file does not exist. In addition, if +.Pa $HOME/.magic.mgc +or +.Pa $HOME/.magic +exists, it will be used in preference to the system magic files. +.Pp +If a file does not match any of the entries in the magic file, +it is examined to see if it seems to be a text file. +ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets +(such as those used on Macintosh and IBM PC systems), +UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC +character sets can be distinguished by the different +ranges and sequences of bytes that constitute printable text +in each set. +If a file passes any of these tests, its character set is reported. +ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified +as +.Sq text +because they will be mostly readable on nearly any terminal; +UTF-16 and EBCDIC are only +.Sq character data +because, while +they contain text, it is text that will require translation +before it can be read. +In addition, +.Nm +will attempt to determine other characteristics of text-type files. +If the lines of a file are terminated by CR, CRLF, or NEL, instead +of the Unix-standard LF, this will be reported. +Files that contain embedded escape sequences or overstriking +will also be identified. +.Pp +Once +.Nm +has determined the character set used in a text-type file, +it will +attempt to determine in what language the file is written. +The language tests look for particular strings (cf. +.In names.h +) that can appear anywhere in the first few blocks of a file. +For example, the keyword +.Em .br +indicates that the file is most likely a +.Xr troff 1 +input file, just as the keyword +.Em struct +indicates a C program. +These tests are less reliable than the previous +two groups, so they are performed last. +The language test routines also test for some miscellany +(such as +.Xr tar 1 +archives). +.Pp +Any file that cannot be identified as having been written +in any of the character sets listed above is simply said to be +.Sq data . +.Sh OPTIONS +.Bl -tag -width indent +.It Fl b , -brief +Do not prepend filenames to output lines (brief mode). +.It Fl c , -checking-printout +Cause a checking printout of the parsed form of the magic file. +This is usually used in conjunction with the +.Fl m +flag to debug a new magic file before installing it. +.It Fl C , -compile +Write a +.Pa magic.mgc +output file that contains a pre-parsed version of the magic file or directory. +.It Fl e , -exclude Ar testname +Exclude the test named in +.Ar testname +from the list of tests made to determine the file type. Valid test names +are: +.Bl -tag -width +.It apptype +.Dv EMX +application type (only on EMX). +.It text +Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the +.Sq encoding +option). +.It encoding +Different text encodings for soft magic tests. +.It tokens +Looks for known tokens inside text files. +.It cdf +Prints details of Compound Document Files. +.It compress +Checks for, and looks inside, compressed files. +.It elf +Prints ELF file details. +.It soft +Consults magic files. +.It tar +Examines tar files. +.El +.It Fl f , -files-from Ar namefile +Read the names of the files to be examined from +.Ar namefile +(one per line) +before the argument list. +Either +.Ar namefile +or at least one filename argument must be present; +to test the standard input, use +.Sq - +as a filename argument. +.It Fl F , -separator Ar separator +Use the specified string as the separator between the filename and the +file result returned. Defaults to +.Sq \&: . +.It Fl h , -no-dereference +option causes symlinks not to be followed +(on systems that support symbolic links). This is the default if the +environment variable +.Dv POSIXLY_CORRECT +is not defined. +.It Fl i , -mime +Causes the file command to output mime type strings rather than the more +traditional human readable ones. Thus it may say +.Sq text/plain; charset=us-ascii +rather than +.Sq ASCII text . +In order for this option to work, file changes the way +it handles files recognized by the command itself (such as many of the +text file types, directories etc), and makes use of an alternative +.Sq magic +file. +(See the FILES section, below). +.It Fl -mime-type , -mime-encoding +Like +.Fl i , +but print only the specified element(s). +.It Fl k , -keep-going +Don't stop at the first match, keep going. Subsequent matches will be +have the string +.Sq "\[rs]012\- " +prepended. +(If you want a newline, see the +.Sq "\-r" +option.) +.It Fl L , -dereference +option causes symlinks to be followed, as the like-named option in +.Xr ls 1 +(on systems that support symbolic links). +This is the default if the environment variable +.Dv POSIXLY_CORRECT +is defined. +.It Fl m , -magic-file Ar list +Specify an alternate list of files and directories containing magic. +This can be a single item, or a colon-separated list. +If a compiled magic file is found alongside a file or directory, it will be used instead. +.It Fl n , -no-buffer +Force stdout to be flushed after checking each file. +This is only useful if checking a list of files. +It is intended to be used by programs that want filetype output from a pipe. +.It Fl N , -no-pad +Don't pad filenames so that they align in the output. +.It Fl p , -preserve-date +On systems that support +.Xr utime 2 +or +.Xr utimes 2 , +attempt to preserve the access time of files analyzed, to pretend that +.Nm +never read them. +.It Fl r , -raw +Don't translate unprintable characters to \eooo. +Normally +.Nm +translates unprintable characters to their octal representation. +.It Fl s , -special-files +Normally, +.Nm +only attempts to read and determine the type of argument files which +.Xr stat 2 +reports are ordinary files. +This prevents problems, because reading special files may have peculiar +consequences. +Specifying the +.Fl s +option causes +.Nm +to also read argument files which are block or character special files. +This is useful for determining the filesystem types of the data in raw +disk partitions, which are block special files. +This option also causes +.Nm +to disregard the file size as reported by +.Xr stat 2 +since on some systems it reports a zero size for raw disk partitions. +.It Fl v , -version +Print the version of the program and exit. +.It Fl z , -uncompress +Try to look inside compressed files. +.It Fl 0 , -print0 +Output a null character +.Sq \e0 +after the end of the filename. Nice to +.Xr cut 1 +the output. This does not affect the separator which is still printed. +.It Fl -help +Print a help message and exit. +.El +.Sh FILES +.Bl -tag -width __MAGIC__.mgc -compact +.It Pa __MAGIC__.mgc +Default compiled list of magic. +.It Pa __MAGIC__ +Directory containing default magic files. +.El +.Sh ENVIRONMENT +The environment variable +.Dv MAGIC +can be used to set the default magic file name. +If that variable is set, then +.Nm +will not attempt to open +.Pa $HOME/.magic . +.Nm +adds +.Sq .mgc +to the value of this variable as appropriate. +The environment variable +.Dv POSIXLY_CORRECT +controls (on systems that support symbolic links), whether +.Nm +will attempt to follow symlinks or not. If set, then +.Nm +follows symlink, otherwise it does not. This is also controlled +by the +.Fl L +and +.Fl h +options. +.Sh SEE ALSO +.Xr magic __FSECTION__ , +.Xr strings 1 , +.Xr od 1 , +.Xr hexdump 1, +.Xr file 1posix +.Sh STANDARDS CONFORMANCE +This program is believed to exceed the System V Interface Definition +of FILE(CMD), as near as one can determine from the vague language +contained therein. +Its behavior is mostly compatible with the System V program of the same name. +This version knows more magic, however, so it will produce +different (albeit more accurate) output in many cases. +.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html +.Pp +The one significant difference +between this version and System V +is that this version treats any white space +as a delimiter, so that spaces in pattern strings must be escaped. +For example, +.Bd -literal -offset indent +>10 string language impress\ (imPRESS data) +.Ed +.Pp +in an existing magic file would have to be changed to +.Bd -literal -offset indent +>10 string language\e impress (imPRESS data) +.Ed +.Pp +In addition, in this version, if a pattern string contains a backslash, +it must be escaped. +For example +.Bd -literal -offset indent +0 string \ebegindata Andrew Toolkit document +.Ed +.Pp +in an existing magic file would have to be changed to +.Bd -literal -offset indent +0 string \e\ebegindata Andrew Toolkit document +.Ed +.Pp +SunOS releases 3.2 and later from Sun Microsystems include a +.Nm +command derived from the System V one, but with some extensions. +My version differs from Sun's only in minor ways. +It includes the extension of the +.Sq & +operator, used as, +for example, +.Bd -literal -offset indent +>16 long&0x7fffffff >0 not stripped +.Ed +.Sh MAGIC DIRECTORY +The magic file entries have been collected from various sources, +mainly USENET, and contributed by various authors. +Christos Zoulas (address below) will collect additional +or corrected magic file entries. +A consolidation of magic file entries +will be distributed periodically. +.Pp +The order of entries in the magic file is significant. +Depending on what system you are using, the order that +they are put together may be incorrect. +If your old +.Nm +command uses a magic file, +keep the old magic file around for comparison purposes +(rename it to +.Pa __MAGIC__.orig ). +.Sh EXAMPLES +.Bd -literal -offset indent +$ file file.c file /dev/{wd0a,hda} +file.c: C program text +file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), + dynamically linked (uses shared libs), stripped +/dev/wd0a: block special (0/0) +/dev/hda: block special (3/0) + +$ file -s /dev/wd0{b,d} +/dev/wd0b: data +/dev/wd0d: x86 boot sector + +$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} +/dev/hda: x86 boot sector +/dev/hda1: Linux/i386 ext2 filesystem +/dev/hda2: x86 boot sector +/dev/hda3: x86 boot sector, extended partition table +/dev/hda4: Linux/i386 ext2 filesystem +/dev/hda5: Linux/i386 swap file +/dev/hda6: Linux/i386 swap file +/dev/hda7: Linux/i386 swap file +/dev/hda8: Linux/i386 swap file +/dev/hda9: empty +/dev/hda10: empty + +$ file -i file.c file /dev/{wd0a,hda} +file.c: text/x-c +file: application/x-executable +/dev/hda: application/x-not-regular-file +/dev/wd0a: application/x-not-regular-file + +.Ed +.Sh HISTORY +There has been a +.Nm +command in every +.Dv UNIX since at least Research Version 4 +(man page dated November, 1973). +The System V version introduced one significant major change: +the external list of magic types. +This slowed the program down slightly but made it a lot more flexible. +.Pp +This program, based on the System V version, +was written by Ian Darwin <ian@darwinsys.com> +without looking at anybody else's source code. +.Pp +John Gilmore revised the code extensively, making it better than +the first version. +Geoff Collyer found several inadequacies +and provided some magic file entries. +Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. +.Pp +Guy Harris, guy@netapp.com, made many changes from 1993 to the present. +.Pp +Primary development and maintenance from 1990 to the present by +Christos Zoulas (christos@astron.com). +.Pp +Altered by Chris Lowth, chris@lowth.com, 2000: +Handle the +.Fl i +option to output mime type strings, using an alternative +magic file and internal logic. +.Pp +Altered by Eric Fischer (enf@pobox.com), July, 2000, +to identify character codes and attempt to identify the languages +of non-ASCII files. +.Pp +Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME +support and merge MIME and non-MIME magic, support directories as well +as files of magic, apply many bug fixes and improve the build system. +.Pp +The list of contributors to the +.Sq magic +directory (magic files) +is too long to include here. +You know who you are; thank you. +Many contributors are listed in the source files. +.Sh LEGAL NOTICE +Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. +Covered by the standard Berkeley Software Distribution copyright; see the file +LEGAL.NOTICE in the source distribution. +.Pp +The files +.Dv tar.h +and +.Dv is_tar.c +were written by John Gilmore from his public-domain +.Xr tar 1 +program, and are not covered by the above license. +.Sh BUGS +.Pp +There must be a better way to automate the construction of the Magic +file from all the glop in Magdir. +What is it? +.Pp +.Nm +uses several algorithms that favor speed over accuracy, +thus it can be misled about the contents of +text +files. +.Pp +The support for text files (primarily for programming languages) +is simplistic, inefficient and requires recompilation to update. +.Pp +The list of keywords in +.Dv ascmagic +probably belongs in the Magic file. +This could be done by using some keyword like +.Sq * +for the offset value. +.Pp +Complain about conflicts in the magic file entries. +Make a rule that the magic entries sort based on file offset rather +than position within the magic file? +.Pp +The program should provide a way to give an estimate +of +.Sq how good +a guess is. +We end up removing guesses (e.g. +.Sq From\ +as first 5 chars of file) because +they are not as good as other guesses (e.g. +.Sq Newsgroups: +versus +.Sq Return-Path: +). +Still, if the others don't pan out, it should be possible to use the +first guess. +.Pp +This manual page, and particularly this section, is too long. +.Sh RETURN CODE +.Nm +returns 0 on success, and non-zero on error. +.Sh AVAILABILITY +You can obtain the original author's latest version by anonymous FTP +on +.Dv ftp.astron.com +in the directory +.Dv /pub/file/file-X.YZ.tar.gz |