Monday 6 August 2012

Extract Text from Word File with C#, VB.NET

In my last article, I introduce a pretty easy way to extract images from word fileToday, I will share another method to extract text from word document with C#, VB.NET. In our daily work, word document always plays an indispensable role. It has powerful functions to edit the text with different formats, such as font, header and footer, comments, hyperlink and so on. At the same time, it is just because this reason, we need to extract only the content without its format in a .txt file, while not using document.SaveToFile method to save Word as Text directly.

Easy way to extract text from word document with C#, VB.NET

I am very happy that Spire.Doc, an MS Word component, can be my best hand to finish this task. Using Spire.Doc, I only need three simple steps to realize the text extraction function. If necessary,you can use the source code to freely download it.
Procedure

Step1. Create a new project

1.     Create a new project in Visual Studio and set its Target framework to be .NET Framework 4.

2.     Add Spire.Doc DLL as reference.

3.     Add below using at the top of the method.
C#
using System.IO;
using Spire.Doc;
using Spire.Doc.Documents;

VB.NET
Imports System.IO
Imports Spire.Doc
Imports Spire.Doc.Documents

Step2. Extract text from word document with C#, VB.NET

1.     Load a Word document from system.

C# CODE:
         Document doc = new Document();
         doc.LoadFromFile(@"D:\michelle\JaneEyre.doc", FileFormat.Doc);
VB.NET CODE:
         Dim doc As New Document()    
         doc.LoadFromFile("D:\michelle\JaneEyre.doc", FileFormat.Doc)

2.     Extract text from word document.
C# Code:
            //new a stringBuilder to extract text from word document            StringBuilder sb = new StringBuilder();                        //extract text from word document            foreach (Section section in doc.Sections)
            {
                foreach (Paragraph paragraph in section.Paragraphs)
                {
                    sb.AppendLine(paragraph.Text);
                }
            }

VB.NET Code:

                  'new a stringBuilder to extract text from word document                 Dim sb As New StringBuilder()
                  'extract text from word document                 For Each section As Section In doc.Sections
                     For Each paragraph As Paragraph In section.Paragraphs
                          sb.AppendLine(paragraph.Text)
                       Next
                   Next

Step3. Save the text to a .txt file and launch the .txt file.

C# Code:
            //write the text of word document into a txt file            File.WriteAllText(@"result.txt", sb.ToString());
            //launch the text file            System.Diagnostics.Process.Start(@"result.txt");
VB.NET Code:

         'write the text of word document into a txt file         File.WriteAllText("result.txt", sb.ToString())
           'launch the text file         System.Diagnostics.Process.Start("result.txt")

Code Source:  

No comments:

Post a Comment

Popular posts