Linux Terminal: How to do fuzzy search with tre-agrep
Probably everyone that use a terminal know the command grep, from its man page:
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.
So this is the best tool to search in big file for a specific pattern, or a specific process in the complete list of running processes, but it has a small limit, it searches for the exact string that you ask, and sometime it could be useful to do an “approximate” or “fuzzy” search.
For this goal the program agrep was firstly developed, from wikipedia we can see some detail of this software:
agrep (approximate grep) is a proprietary approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system. It was later ported to OS/2, DOS, and Windows.
It selects the best-suited algorithm for the current query from a variety of the known fastest (built-in) string searching algorithms, including Manber and Wu’s bitap algorithm based on Levenshtein distances.
agrep is also the search engine in the indexer program GLIMPSE. agrep is free for private and non-commercial use only, and belongs to the University of Arizona.
So it’s closed source, but luckily there is an open source source alternative: tre-agrep
TRE is a lightweight, robust, and efficient POSIX compliant regexp matching library with some exciting features such as approximate (fuzzy) matching.
The matching algorithm used in TRE uses linear worst-case time in the length of the text being searched, and quadratic worst-case time in the length of the used regular expression. In other words, the time complexity of the algorithm is O(M^2N), where M is the length of the regular expression and
Read more at Linux Aria
Comments are closed.